Skip to content

Instantly share code, notes, and snippets.

@vejvarm
Last active October 30, 2024 07:21
Show Gist options
  • Save vejvarm/bee45154c5dd21fa668c408952316849 to your computer and use it in GitHub Desktop.
Save vejvarm/bee45154c5dd21fa668c408952316849 to your computer and use it in GitHub Desktop.
Installing MS-AMP to a local conda environment (needs sudo rights)
# create new local conda env
conda create --prefix ./.conda python=3.11
conda activate ./.conda
# https://azure.github.io/MS-AMP/docs/getting-started/installation/
git clone https://github.com/Azure/MS-AMP.git
cd MS-AMP
git submodule update --init --recursive
cd third_party/msccl
# A100
make -j src.build NVCC_GENCODE="-gencode=arch=compute_80,code=sm_80"
# H100
make -j src.build NVCC_GENCODE="-gencode=arch=compute_90,code=sm_90"
sudo apt-get update
sudo apt install build-essential devscripts debhelper fakeroot
make pkg.debian.build
sudo dpkg -i build/pkg/deb/libnccl2_*.deb
sudo dpkg -i build/pkg/deb/libnccl-dev_2*.deb
cd -
pip install --upgrade pip
# https://github.com/mpi4py/mpi4py/discussions/236
conda install mpi4py
pip install .
# make postinstall # (does not work as it partially requires sudo and then breaks on pip install)
# do below instead:
cd msamp/operators/dist_op && sudo bash build.sh && cd -
cd msamp/operators/arithmetic && pip install -v . && cd -
cd msamp/optim && pip install -v . && cd -
# to access libnccl.so
cd third_party/msccl/build/lib && sudo cp libnccl.so libnccl.so.2 libnccl.so.2.17.1 /usr/local/lib/ && cd -
# finally
NCCL_LIBRARY=/usr/lib/x86_64-linux-gnu/libnccl.so # Change as needed
export LD_PRELOAD="/usr/local/lib/libmsamp_dist.so:${NCCL_LIBRARY}:${LD_PRELOAD}"
# test the runtime
python3 -c "import msamp; print(msamp.__version__)"
# if error above, try:
pip install --upgrade deepspeed
@vejvarm
Copy link
Author

vejvarm commented Oct 30, 2024

When running python3 -c "import msamp; print(msamp.__version__)", if you get ModuleNotFoundError: No module named 'transformer_engine_extensions', refer to Cannot import and use transformer_engine after successful installation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment