Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save MihailCosmin/affa6b1b71b43787e9228c25fe15aeba to your computer and use it in GitHub Desktop.
Save MihailCosmin/affa6b1b71b43787e9228c25fe15aeba to your computer and use it in GitHub Desktop.
Instructions for CUDA v11.8 and cuDNN 8.7 installation on Ubuntu 22.04 for PyTorch 2.0.0
#!/bin/bash
### steps ####
# verify the system has a cuda-capable gpu
# download and install the nvidia cuda toolkit and cudnn
# setup environmental variables
# verify the installation
###
### to verify your gpu is cuda enable check
lspci | grep -i nvidia
### If you have previous installation remove it first.
sudo apt purge nvidia* -y
sudo apt remove nvidia-* -y
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt autoremove -y && sudo apt autoclean -y
sudo rm -rf /usr/local/cuda*
# system update
sudo apt update && sudo apt upgrade -y
# install other import packages
sudo apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev
# first get the PPA repository driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
# find recommended driver versions for you
ubuntu-drivers devices
# install nvidia driver with dependencies
sudo apt install libnvidia-common-515 libnvidia-gl-515 nvidia-driver-515 -y
# reboot
sudo reboot now
# verify that the following command works
nvidia-smi
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
# Update and upgrade
sudo apt update && sudo apt upgrade -y
# installing CUDA-11.8
sudo apt install cuda-11-8 -y
# setup your paths
echo 'export PATH=/usr/local/cuda-11.8/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
sudo ldconfig
# install cuDNN v11.8
# First register here: https://developer.nvidia.com/developer-program/signup
CUDNN_TAR_FILE="cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz"
sudo wget https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz
sudo tar -xvf ${CUDNN_TAR_FILE}
sudo mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive cuda
# copy the following files into the cuda toolkit directory.
sudo cp -P cuda/include/cudnn.h /usr/local/cuda-11.8/include
sudo cp -P cuda/lib/libcudnn* /usr/local/cuda-11.8/lib64/
sudo chmod a+r /usr/local/cuda-11.8/lib64/libcudnn*
# Finally, to verify the installation, check
nvidia-smi
nvcc -V
# install Pytorch (an open source machine learning framework)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
@toebee82
Copy link

Fantastic! Thanks so much! Had to do it a couple times. Ended up just replacing 515 with 535 for the NVIDIA drivers and it worked!

Avoided some of the other elaborate schemes - including NVIDIAs own, very confusing and lengthy guide.

@noamsgl
Copy link

noamsgl commented Aug 30, 2023

Worked like a charm (with the fix mentioned by @filmo and @mkabatek)

@wassname
Copy link

wassname commented Oct 6, 2023

Perhaps this regex would work better, getting libnvidia, kernel modules, etc

sudo apt-get purge `.*nvidia.*`
sudo apt remove `.*nvidia.*`

@qinchuanhui
Copy link

I just used the 535 version NVIDIA drivers mentioned by @toebee82. When using nvidia-smi after all the installation, it showed "Failed to initialize NVML: Driver/library version mismatch";
Then I reboot the machine, and all of them worked, but with version 520 (not 535). I guess it means to align with the 11.8 CUDA_runtime_toolkits.
Btw, about the different cuda-version showed in nvidia-smi and nvcc, there's an answer: https://stackoverflow.com/questions/53422407/different-cuda-versions-shown-by-nvcc-and-nvidia-smi

@rsmath
Copy link

rsmath commented Nov 21, 2023

One should never do this
sudo rm -rf /usr/local/cuda*
Apt gets confused about what it expects to be there and what is actually there. If something needs to be removed, use apt purge, similar to pip uninstall.

@wbreslin951
Copy link

Does not work with 545 drivers. I just used the 515 drivers in the command (which show up as 525 in smi?) but it seems to be working now. thanks for the thread. ive been through every tut and this is the only one thats been successful

@rsmath
Copy link

rsmath commented Nov 26, 2023

@wbreslin951, curious, do your nvidia-smi and nvcc --version show the same cuda version being used? If so, which version is it?

@cirr8765
Copy link

cirr8765 commented Jan 3, 2024

SOLVED: https://forums.developer.nvidia.com/t/ubuntu-cuda-11-8-package-wrong-dependency-on-cuda-drivers/238891
When running sudo apt install cuda -y you can specify the current nvidia driver version, preventing the installer from upgrading:

sudo apt install cuda-11-8 cuda-drivers=535.129.03-1

I need to run the 535 drivers, but after sudo apt install cuda-11-8 -y it automatically switches over to 545 which then causes:

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 545.23

But, when I go into "Software and Updates" and try to switch back it complains about unmet dependencies, and also all files in /usr/local/cuda-11.8/ except for ./targets/ is automatically deleted at this stage !?

@cirr8765
Copy link

cirr8765 commented Jan 3, 2024

-> sudo cp -P cuda/lib/libcudnn* /usr/local/cuda-11.8/lib64/ cp: target '/usr/local/cuda-11.8/lib64/' is not a directory

fix: mkdir /usr/local/cuda-11.8/lib64 and if priyamshah@priyamshah-System-Product-Name:~$ nvidia-smi Failed to initialize NVML: Driver/library version mismatch NVML library version: 535.86

do a sudo reboot

this fixes nvidia-smi

but nvcc -V is broken

on trying sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc

it says

The following packages have unmet dependencies: libcuinj64-11.5 : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or libnvidia-compute-495-server (>= 495) but it is not installable or libcuda.so.1 (>= 495) or libcuda-11.5-1 libnvidia-ml-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or libnvidia-compute-495-server (>= 495) but it is not installable or libnvidia-ml.so.1 (>= 495) nvidia-cuda-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or libnvidia-compute-495-server (>= 495) but it is not installable or libcuda.so.1 (>= 495) or libcuda-11.5-1 Recommends: libnvcuvid1 but it is not installable

follow this link https://stackoverflow.com/questions/66380789/nvidia-driver-installation-unmet-dependencies [Unchecking the cuda repo from Software & Updates did the trick.]

then try again sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc

this should fix nvcc -V

I am not sure this is a good solution as you have just installed the cuda-toolkit and if you do this you risk running into dependency problems. The problem could just be, as it was for me, that you didn't sucessfully add /usr/local/cuda-11.8/bin to $PATH. First take a look in /usr/local/cuda-11.8/bin, if nvcc is in there, just try to add it again, i.e. run

export PATH=/usr/local/cuda-11.8/bin:$PATH

and check your path with echo $PATH to see if it's in there. If this works, simply add the export line at the bottom of your ~/.bashrc to make it permanent.

@joseagraz
Copy link

Thanks for such great tutorial, made my own referencing yours
https://github.com/Kidney-Science/install_RTXA4000_Driver_CUDA_cudNN_Ubuntu_22

@samuponz
Copy link

@filmo "Not sure why it says CUDA 12.2 instead of 11.8 in nvidia-smi?? Perhaps this is only related to the graphics driver??"

You are right. nvidia-smi shows the latest version of CUDA supported by your GPU drivers, not the installed version of CUDA. Check this.

@zizimars
Copy link

Thank you! After lots of days, it works! Instead of 515, I put 525.

@kazmifactor
Copy link

kazmifactor commented Feb 7, 2024

I get this error even after trying the fixen given by @filmo and @mkabatek.


inp@inp-Z790-GAMING-X:~$ sudo apt install cuda-11-8
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda-11-8 : Depends: cuda-runtime-11-8 (>= 11.8.0) but it is not going to be installed
             Depends: cuda-demo-suite-11-8 (>= 11.8.86) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

I have even tried to install 515, 525, 535.

i have installed ubuntu 20.04
can anyone please help. @MihailCosmin @filmo @mkabatek

@joseagraz
Copy link

I ran into the same error very early on. Try my recipe at the link below on a fresh Ubuntu copy. The recipe was tested on different PC, but using the same GPU. Good luck!
https://github.com/Kidney-Science/install_RTXA4000_Driver_CUDA_cudNN_Ubuntu_22

@SonOfSkywalker
Copy link

Thank you so much, this script is marvelous !

@hawkiyc
Copy link

hawkiyc commented Apr 19, 2024

Gosh, you saved my day. I finally solved my computing env. Thank you very much.

@aidv
Copy link

aidv commented May 18, 2024

I ran into the same error very early on. Try my recipe at the link below on a fresh Ubuntu copy. The recipe was tested on different PC, but using the same GPU. Good luck! https://github.com/Kidney-Science/install_RTXA4000_Driver_CUDA_cudNN_Ubuntu_22

Your link doesn't work. It takes me to a 404 page.

@aidv
Copy link

aidv commented May 18, 2024

I've been STRUGGLING with my QEMU KVM VM's that seemingly out of nowhere refused to see my GPU's.

After a few days of fiddling around I eventually figured out a solution.

  1. Turn off Secure Boot in the VM bios
  2. Purge all nvidia related stuff
  3. Install version 520 of the graphics driver
  4. Install CUDA 11.8

I created a script to make my life easier:

IMPORTANT!
⚠️ 🗣️ TURN OFF SECURE BOOT IN YOUR VM!!! 🗣️ ⚠️

Script:

#!/bin/bash

#install graphics driver 520 specifically
sudo apt purge nvidia* -y
sudo apt autoremove -y && sudo apt autoclean -y
sudo apt update && sudo apt upgrade -y
sudo apt install g++ freeglut3-dev build-essential libx11-dev libxmu dev libxi-dev libglu1-mesa libglu1-mesa-dev
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install libnvidia-common-520 libnvidia-gl-520 nvidia-driver-520 -y

#install cuda 11.8
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
rm cuda-ubuntu2204.pin
rm cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb

echo "Install completed! Run: 'sudo reboot now' and then after reboot run 'nvidia-smi' and 'nvtop' to confirm that the GPU is recognized."
  1. Copy and paste the above into a file called `tryfixgpu.sh
  2. Then run sudo chmod +x tryfixgpu.sh
  3. Then run the script sudo ./tryfixgpu.sh

⚠️ 🗣️ AND REMEMBER TO TURN OFF SECURE BOOT!!!!!! 🗣️ ⚠️
⚠️ 🗣️ AND REMEMBER TO TURN OFF SECURE BOOT!!!!!! 🗣️ ⚠️
⚠️ 🗣️ AND REMEMBER TO TURN OFF SECURE BOOT!!!!!! 🗣️ ⚠️
⚠️ 🗣️ AND REMEMBER TO TURN OFF SECURE BOOT!!!!!! 🗣️ ⚠️
⚠️ 🗣️ AND REMEMBER TO TURN OFF SECURE BOOT!!!!!! 🗣️ ⚠️

@D1st3f
Copy link

D1st3f commented Sep 17, 2024

Install version 520 of the graphics driver

you sh installing 535

The following NEW packages will be installed:
dctrl-tools dkms libnvidia-cfg1-535 libnvidia-common-520 libnvidia-common-535 libnvidia-decode-535
libnvidia-encode-535 libnvidia-extra-535 libnvidia-fbc1-535 libnvidia-gl-520 libnvidia-gl-535
nvidia-compute-utils-535 nvidia-dkms-535 nvidia-driver-520 nvidia-driver-535 nvidia-firmware-535-535.183.01
nvidia-kernel-common-535 nvidia-kernel-source-535 nvidia-prime nvidia-settings nvidia-utils-535 pkg-config
python3-xkit screen-resolution-extra xserver-xorg-video-nvidia-535

@aidv
Copy link

aidv commented Oct 18, 2024

you sh installing 535

Weird. For me it’s installing 520

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment