This is a companion piece to my instructions on building TensorFlow from source. In particular, the aim is to install the following pieces of software
- NVIDIA graphics card driver (v450.57)
- CUDA (v11.0.2)
- cuDNN (v8.0.2.39)
on an Ubuntu Linux system, in particular Ubuntu 20.04.
At the time of writing (2020-08-06), these were the latest available versions. As a disclaimer, please note that I am not interested in running an outdated Ubuntu version or installing a CUDA/cuDNN version that is not the latest. Therefore, the below instructions may or may not be useful to you. Please also note that the instructions are likely outdated, since I only update them occasionally. Don't just copy these instructions, but check what the respective latest versions are and use these instead!
Download and install the latest NVIDIA graphics driver from here: https://www.nvidia.com/en-us/drivers/unix/. Note that every CUDA version requires a minimum version of the driver; check this beforehand.
Ubuntu 20.04 currently offers installation of the NVIDIA driver version 440.100 through its built-in 'Additional Drivers' mechanism, which should be sufficient for CUDA 10.2. CUDA 11.0 appears to require a newer version of the NVIDIA driver, so we're going to install this manually.
Download and install the latest NVIDIA graphics driver from here: https://www.nvidia.com/en-us/drivers/unix/.
sudo sh NVIDIA-Linux-x86_64-450.57.run
The CUDA runfile also includes a version of the NVIDIA graphics driver, but I like to separate installing either, as the version supplied with CUDA Is not necessarily the latest version of the driver.
Download the latest CUDA version here. For example, I downloaded:
$ wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run
Thankfully, CUDA 11 currently supports the up-to-date Ubuntu version, 20.04, so we don't need to jump through hoops to deal with an unsupported GNU version
error as in previous versions of this document. Simply install as per the official instructions:
$ sudo sh cuda_11.0.2_450.51.05_linux.run
You may need to confirm that the display driver is already installed, and de-select installation of the display driver.
Once finished, you should see a summary like this:
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.0/
Samples: Installed in /home/michael/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-11.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.0/lib64, or, add /usr/local/cuda-11.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-11.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least .00 is required for CUDA 11.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Do what the instructions given in the summary say and add the given directories to your PATH
and LD_LIBRARY_PATH
. For example by adding the following lines to your .bashrc
, .zshrc
, or whatever shell you are using:
export PATH=/usr/local/cuda-11.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH
Just go here and follow the instructions. You'll have to log in, so downloading of the right cuDNN binary packages cannot be easily automated. Meh.
Once downloaded, un-tar the file and copy the contents to their respective locations:
$ tar -xzvf cudnn-11.0-linux-x64-v8.0.2.39.tgz
$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
Pro tip: once you get a working system - make a backup using timeshift
https://github.com/teejee2008/timeshift
Work files are deliverately excluded and that way can roll back OS / nvidia drivers etc to a previous working restore point. The restore process will prompt you with list of files that will be deleted / recovered. Have used it 5-6 times this year as some frameworks / machine learning projects error with latest RTX 30X0 hardware. Some need gcc 10.2 but OS wants to update to 10.3 (which breaks cuda toolkit 11.4 for some ML stuff) - wasted so many hours faffing around - when all you want to do is have it work. I've also had cudnn bizzarely go missing - even though it's clearly in the checkpoint where you can browse files and restore them.
nvidia Driver downgraded - which has forced me to disable updates.
I have pytorch 1.8 + cudatoolkit 11.4 / 470 driver is all working for me (today).
(As nvidia labs has abandoned tensorflow in favour of pytorch - I've steered away from those projects.)