Skip to content

Instantly share code, notes, and snippets.

@ax3l
Last active January 7, 2025 02:15
Show Gist options
  • Save ax3l/9489132 to your computer and use it in GitHub Desktop.
Save ax3l/9489132 to your computer and use it in GitHub Desktop.
CUDA Compilers

In general, check the crt/host_config.h file to find out which versions are supported. Sometimes it is possible to hack the requirements there to get some newer versions working, too :)

Thrust version can be found in $CUDA_ROOT/include/thrust/version.h.

Download Archives: https://developer.nvidia.com/cuda-toolkit-archive

Release notes for CUDA Toolkit (CTK):

Version notes Nvidia HPC SDK:

Compatibility Guarantees

Quote:

  • CUDA 10.0: First introduced in CUDA 10, the CUDA Forward Compatible Upgrade is designed to allow users to get access to new CUDA features and run applications built with new CUDA releases on systems with older installations of the NVIDIA datacenter GPU driver.
  • CUDA 11.1: First introduced in CUDA 11.1, CUDA Enhanced Compatibility provides two benefits:
    • By leveraging semantic versioning across components in the CUDA Toolkit, an application can be built for one CUDA minor release (such as 11.1) and work across all future minor releases within the major family (such as 11.x).
    • CUDA has relaxed the minimum driver version check and thus no longer requires a driver upgrade with minor releases of the CUDA Toolkit.
  • CUDA 12.4: Starting in CUDA 12.4, the NVIDIA driver installation on Linux will be opt-in. The goal is to improve user experience for a wide range of use cases such as installing the open module flavor driver. The cuda-runtime dependency and therefore the cuda-drivers (NVIDIA driver) dependency will be removed from the top-level cuda meta-package. Effectively, the cuda and cuda-toolkit meta-packages will be equivalent in CUDA 12.4.

nvcc

Latest, officical Compiler requirements: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

CUDA version SM Arch g++ icpc pgc++ xlC MSVC clang++ Linux driver thrust note
1.0 1.0-1.1 ? ? ?
1.1 1.0-1.1 ? ? ?
2.0 1.0-1.1 ? ? ?
2.1 1.0-1.3 ? ? ?
2.3.1 1.0-1.3 ? ? ?
3.0 1.0-2.0 ? ? ?
3.1 1.0-2.0 ? ? ?
3.2 1.0-2.1 ? 11.1 ?
4.0 1.0-2.1 <=4.4 11.1 ?
4.1 1.0-2.1 <=4.5 11.1 ?
4.2 1.0-2.1 <=4.6 11.1 ?
5.0 1.0-3.? <=4.6 11.1 ? ? 1.5.3
5.5 1.0-3.? <=4.8 12.1 ? ? 1.7.0 C++11 on host side supported; ICC fixed to build 20110811
6.0 1.0-5.0 <=4.8 13.1 ? 331.62 1.7.1
6.5 1.1-5.X <=4.8 14.0 ? ? ? 1.7.2 experimenal device side C++11 support; including this version, <thrust/sort.h> skrews up __CUDA_ARCH__ (must be undefined on host); deprecation of SM 11-13 (10 removed)
7.0.17 (RC) s. below <=4.9 15.0 >=14.9 13.1.1 ? 346.29 1.8.0 first official PGI support, first xlc string found; powerpc64 w. little endian supported
7.0.27 2.0-5.X <=4.9 15.0 >=14.9 13.1.1 2010-13 346.46 1.8.1 official C++11 support on device side
7.5 <=4.9 15.0 15.4 13.1 2010-13 3.5-3.6 352.41? 1.8.2 clang (host) on linux supported, __CUDACC_VER__ macros added
7.5.18 2.0-5.X <=4.9 15.0 15.4 13.1 2010-13 352.39 1.8.2
8.0.44 2.0-6.X <=5.3 15.0(.4)-16.0 16(.3)+ 13.1(.2) 2012-15 3.8-3.9 367.48 1.8.3-patch2 sm_60 (pascal) support added
8.0.61 2.0-6.X <=5.3 15.0(.4)-17.0 16(.3)+ 13.1(.2) 2012-15 3.8-3.9 375.26 1.8.3-patch2 nvcc 8 is incompatible with std::tuple in gcc 5.4+
9.0.69 (RC) 3.0-7.0 <=5.5 (<=6) 15.0(.4)-17.0 17 13.1(.2) 2012-17 3.8-3.9 ???.?? 1.9.0-patch4 device-side C++14; __CUDACC_VER__ deprecated for __CUDACC_VER_MAJOR/MINOR/BUILD__
9.0.103 (RC) 3.0-7.0 <=5.5 (<=6) 15.0(.4)-17.0 17 13.1(.2) 2012-17 3.8-3.9 384.59 1.9.0-patch4 same as above, __CUDACC_VER__ defined as #error rendering it fully broken
9.0.176 3.0-7.0 <=5.5 (<=6) (15.0-)17.0 17.1 13.1(.5) 2012-17 (3.8-)3.9 384.81 1.9.0-patch5 same as above
9.1.85 3.0-7.2 <=5.5 (<=6) (15.0-)17.0 17.X 13.1(.6) 2012-17 (3.8-)4.0 390.46 1.9.1-patch2 math_functions.hpp moved to crt/
9.1.85.1 cuBLAS 9.1.128: Volta GEMM kernels optimized
9.1.85.2 ptxas: fix address calculations using large immediate operands
9.1.85.3 cuBLAS: fixes to GEMM optimizations for convolutional sequence to sequence (seq2seq) models.
9.0-9.1 nvcc 9.0-9.1 is incompatible with std::tuple in gcc 6+
9.2.88 3.0-7.2 <=7.3.0 (<=7) (15.0-)17.0 17-18.X 13.1(.6),16.1 2012-17 (3.8-)5.0 396.26 1.9.2 CUTLASS 1.0 added; std::tuple fixed (prior GCC 6 issues)
9.2.148 396.37 1.9.2
10.0.130 3.0-7.5 <=7 (15.0-)18.0 17-18.X 13.1, 16.1 2013-17 (3.8-)6.0 410.48 1.9.3 CUDA Forward Compatible Upgrade
10.1.105 3.0-7.5 <=8 (15.0-)19.0 17-19.X 2013-19 (3.8-)7.0 418.39 1.9.4
10.1.168 (3.8-)8.0 418.67 10.1 "Update 1"
10.1.243 418.87 10.1 "Update 2"
10.2.89 3.0-7.5 <=8 (15.0-)19.0 18-19.X 13.1, 16.1 2015-19 (3.3-)8.X 440.33.01 1.9.7 sm_30,35,37,50 deprecated; nvcc: -allow-unsupported-compiler
11.0.1 (RC) NVCC:11.0.167 3.5-8.0 (5-)6-9.* (15.0-)19.1 18-20.1 13.1, 16.1 2015-19 3.2-9.0.0 450.36.06 1.9.9 macOS dropped; libs drop pre-C++11, deprecate pre-C++14 (GCC < 5, Clang < 6, and MSVC < 2017); Arm C/C++ 19.2 support
11.0.2-1 NVCC:11.0.194 (3.3/)6-9.0.0 450.51.05 nvcc: --Wext-lambda-captures-this
11.0.3 NVCC:11.0.221 ? ? ? ? ? ? ? 450.51.06 ? 11.0 "Update 1"; nvcc: --forward-unknown-to-host-compiler, --forward-unknown-to-host-linker flags
11.1.0 NVCC:11.1.74 3.5-8.6 (5-)6-10.0 (15.0-)19.1 18-20.1 13.1, 16.1 2017-19 (3.3/)6-10.X 455.23.05 1.9.10-1 Ubuntu@ppc64le deprecated; CUDA Enhanced Compatibility
11.1.1 NVCC:11.1.? ? ? ?
11.2.0 NVCC:11.2.67 <12 460.27.04 1.10.0
11.2.1 NVCC:11.2.142 460.32.03 ? "Update 1"
11.2.2 NVCC:11.2.152 460.32.03 ? "Update 2"
11.3.0 NVCC:11.3.58 6.0-10.X 465.19.01 ? cu++flt added, Python Driver/RT bindings, alloca()
11.4.0 NVCC:11.4.48 6.0-11.X <13 470.42.01 ? sm30,32 and Ubuntu 16.04 dropped, C++11 stdlib for math
11.4.1 NVCC:11.4.100 6.0-11.X 470.57.02 ? 11.4 "Update 1", fix g++ 10 issues with chrono headers of libstdc++; Ubuntu 16.04 dropped (x86)
11.4.2 NVCC:11.4.120 3.2-12.X 470.57.02 ? ...
11.5.0 NVCC:11.5.50 6.0-11.X 3.2-12.X 495.29.05 ? ...
11.5.1 NVCC:11.5.119
11.6.0 NVCC:11.6.55 6.0-11.X adds VS2022 3.2-13.X 510.39.01 ? adds -arch=native and PTX generation in nvlink (for LTO workflows with PTX)
11.6.1 NVCC:11.6.112 510.47.03 ?
11.6.2 NVCC:11.6.124 510.47.03 ?
11.7.0 NVCC:11.7.64 ? ? ? ? 515.43.04 ?
11.7.1 NVCC:11.7.99 515.65.01 ?
11.8.0 NVCC:11.8.89 6.0-11.2.1 520.61.05 ?
12.0.0 NVCC:12.0.76 4.0-9.0 6.0-12.1 (12.2.1) 2021.6 22.7 16.1.x -VS2022 17.4 -14.X 525.60.13 2.0.1 C++20 support, Hopper and Lovelance, JIT LTO (nvJitLink lib), NVVM IR 2.0, CUDA-MEMCHECK -> Compute Sanitizer , sm_35/37 dropped in all libs, 32-bit compilation support dropped
12.3.0 NVCC:12.0.76 545.23.06 2.2.0
CUDA version SM g++ icpc pgc++ xlC MSVC clang++ Linux driver thrust note

SM: means SM architecture support.

pgc++: now NVHPC products, e.g., nvc/nvfortran/nvc++.

Note: empty cells generally mean "same as above" for readability.

macOS: As of 7.0, clang seems to be the only supported compiler on OSX (but no version check found). CUDA 10.1.243 adds support for Xcode 10.2 . CUDA 11.0 dropped macOS support.

Compilers such as pgC, icc, xlC are only supported on x86 linux and little endian.

Dynamic parallelism was added with sm_35 and CUDA 5.0.

Newer CUDA releases have a per-release support matrix for compilers, which also lists supported kernel and glibc versions: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements

clang++ -x cuda

clang++ can compile CUDA C++ to ptx as well. Give it a whirl!

clang++ supported CUDA release supported SMs
3.9-5.0 7.0-8.0 2.0-(5.0)6.0
6.0 7.0-9.0 (2.0)3.0-7.0
7.0 7.0-9.2 (2.0)3.0-7.2
8.0 7.0-10.0 (2.0)3.0-7.5
9.0 7.0-10.1 (2.0)3.0-7.5
10.0 7.0-10.1 (2.0)3.0-7.5
11.0 7.0-11.0 (2.0)3.0-8.0
12.0 7.0-11.0 (2.0)3.0-8.0
13.0 7.0-11.2 (2.0)3.0-8.6
14.0 7.0-11.5 (2.0)3.0-8.6
15.0 7.0-11.5 (2.0)3.0-8.6
16.0 7.0-11.8 (2.0)3.0-9.0
main 7.0-12.1 (2.0)3.5-9.0

https://llvm.org/docs/CompileCudaWithLLVM.html

Device-Side C++ Standard Support

C++ core language features:

supported C++ standard notes
nvcc -6.0 c++03
nvcc 6.5 c++03, exp. c++11 undocumented
nvcc 7.0-8.0 c++03,11 only c++11 switch
nvcc 9.0-10.2 c++03,11,14 10.2 adds libcu++ (atomics); open repository: https://github.com/NVIDIA/libcudacxx/releases
nvcc 11.0.167+ c++03,11,14,17 C++11 host compiler needed for math libs; ships C++11-compatible backport of the C++20 synchronization library; device LTO added; starting with CUDA Toolkit 11.0.1, nvcc and CUDA Toolkit versions are not equivalent anymore
nvcc 12.0+ c++03,11,14,17,20
clang 5+ c++03,11,14,17
clang 6+ c++03,11,14,17,2a
clang 10+ c++03,11,14,17,20
clang 13+ c++03,11,14,17,20,2b
clang trunk c++03,11,14,17,20,2b status

CUDA-enabled C++ standard library libcu++, based on LLVM's libc++ (docs):

introduced components notes
CUDA 10.2 <atomic> (SM6.0+), <type_traits> introduction of libcu++
CUDA 11.0 atomic<T>::wait/notify, <barrier>, <latch>, <counting_semaphore>(SM7.0+), <chrono>, <ratio>, <functional> w/o function anticipated with GTC 2020 slides
CUDA 11.2 cuda::std::tuple,pair notes
CUDA 12.0 cuda::std::barrier
CUDA next cuda::std::complex, backports: chrono, type_traits notes
newer see the release notes and api docs all open source now

Incremental libcu++ release goals (GTC 2020):

  • Version 1 (CUDA 10.2): <atomic>(SM6.0+), <type_traits>.
  • Version 2 (CUDA next): atomic<T>::wait/notify, <barrier>, <latch>, <counting_semaphore>(SM7.0+), <chrono>, <ratio>, <functional>minus function.
  • Future priorities: atomic_ref<T>, <complex>, <tuple>, <array>, <utility>, <cmath>, string processing, ...

NVC++

NVC++ is a unified C++ compiler and GPU-accelerated STL for the CUDA platform. It also seems to support OpenACC. NVC++ does currently not support the CUDA C++ language.

supported C++ standard notes
nvc++ 11.0 ...,c++17 initial release, ships C++11-compatible backport of the C++20 synchronization library

All GPU compilers are cheese.

@ax3l
Copy link
Author

ax3l commented Sep 4, 2021

Automated compiler crawling script on include/host_config.h by @haampie:
spack/spack#25054 (comment)

@Artem-B
Copy link

Artem-B commented Sep 4, 2021

BTW, top-of-the-tree clang (14?) now defaults to sm_35 with CUDA support bumped up up to 11.4.
Default C++ version for CUDA compilation now matches that of C++ compilation and is currently C++14.

@ax3l
Copy link
Author

ax3l commented Jan 21, 2022

Thanks Artem, updated :)

@DStrelak
Copy link

@Flamefire
Copy link

CUDA 11.3.1 supports GCC 10.x, i.e. all GCC 10 minor versions, similar 11.4.1 supports all GCC 11 versions, so more complete:

  • 9.2 GCC < 8
  • 10.1 - 10.2 GCC < 9
  • 11.0 GCC < 10
  • 11.1 - 11.3 GCC < 11
  • 11.4 - 11.7 GCC < 12

And for Clang:

  • 10.2 Clang < 9
  • 11.1 Clang < 11
  • 11.2 - 11.3 Clang < 12
  • 11.4 - 11.5 Clang < 13
  • 11.6 - 11.7 Clang < 14

All checked for the SDKs I have installed via the checks in crt/host_config.h. I found this after an error in PyTorch pointed me to a compatibility check based on this table

@ax3l
Copy link
Author

ax3l commented Nov 1, 2022

@Flamefire Thanks, that usually is right - I try to document the host compiler version known at release time and documented by Nvidia for the specific release to have been tested with. Minor releases of host compilers released after the CTK usually (but not always) work well together.

Updated the 11.3 and 11.4 tables accordingly to your tests - thanks a lot!

Glad to see that PyTorch cites us! :)

@ax3l
Copy link
Author

ax3l commented Nov 1, 2022

CUDA 11.2.2 supports GCC-9, not 10, see:
https://docs.nvidia.com/cuda/archive/11.2.2/cuda-installation-guide-linux/index.html

@DStrelak thanks - I thik that is not as strict, checking crt/host_config.h. See @Flamefire's comment for comparison.

@DStrelak
Copy link

@DStrelak thanks - I thik that is not as strict, checking crt/host_config.h. See @Flamefire's comment for comparison.

I disagree, at least my my particular case it seems to be rather explicit about it :-)

/usr/local/cuda-11.2/bin/nvcc -o mycode.o -c --x cu -D_FORCE_INLINES -Xcompiler -fPIC -ccbin /usr/bin/g++-10 -std=c++14 --expt-extended-lambda -gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_86,code=compute_86 -I../ -I/usr/include -I/usr/include/hdf5/serial -I/usr/include/opencv4 -Iexternal -Ilibraries mycode.cpp
/usr/include/c++/10/chrono: In substitution of 'template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]':
/usr/include/c++/10/chrono:473:154:   required from here
/usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault
  428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept
      |                           ^~~~~~
0x7f784e68b08f ???
	/build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x7f784e66c082 __libc_start_main
	../csu/libc-start.c:308
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.

@fangq
Copy link

fangq commented Jul 2, 2023

I found this page very useful - I am wondering if anyone is using github action for CI?

after github retires macos-10.5/xcode 10.0.3, I can non longer compile my cuda code using github's macos-11 and newer runners. see

actions/runner-images#7838

a strange thing is that even I copy cuda 10.2 or 8 on the macos-11 runner and use the nvcc in these older cuda versions, it still complain Apple Clang or GNU GCC are not supported. It also appears to me that such error was not thrown by crt/host_config.h.

is there ANY way to build CUDA on macos 11 or newer?

@Flamefire
Copy link

@fangq I'd just not bother with building CUDA on CI for macos anymore if CUDA support is dropped. What is the point of testing something if it cannot be run anymore? And it seems the latest working CUDA on macos is 10.2 and that only supports Clang <= 8.x
But if you really want you can try nvcc: -allow-unsupported-compiler as mentioned in the table above.

@fangq
Copy link

fangq commented Jul 3, 2023

@Flamefire, I can't run any test from the runner anyways because it does not have GPU. I need it for building binaries for supported OSes. I statically link with cudart, so even cuda can not run on newer macos, a binary can still run as long as nvidia driver is installed.

I just tried the -allow-unsupported-compiler flag, unfortunately it has no effect, nvcc still complains that Apple Clang or GCC are not supported, nvcc 10.2 on macos 12; tried --allow-unsupported-compiler as well, same error.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:34:12_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

$ make CUDACC='nvcc -allow-unsupported-compiler'
nvcc -allow-unsupported-compiler -c -g -lineinfo -Xcompiler -Wall -DSAVE_DETECTORS -use_fast_math -arch=sm_35 -DMCX_TARGET_NAME='"Fermi MCX"' -DUSE_ATOMIC -use_fast_math -DSAVE_DETECTORS -o mcx_core.o  mcx_core.cu
nvcc fatal   : The version ('14.0') of the host compiler ('Apple clang') is not supported
make: *** [mcx_core.o] Error 1

$ make CUDACC='nvcc -allow-unsupported-compiler -ccbin /usr/local/bin/gcc-7'
/bin/sh: line 0: [: /Users/fangq/Downloads/cuda/bin/nvcc: binary operator expected
nvcc -allow-unsupported-compiler -ccbin /usr/local/bin/gcc-7 -c -g -lineinfo -Xcompiler -Wall -DSAVE_DETECTORS -use_fast_math -arch=sm_35 -DMCX_TARGET_NAME='"Fermi MCX"' -DUSE_ATOMIC -use_fast_math -DSAVE_DETECTORS -o mcx_core.o  mcx_core.cu
nvcc fatal   : GNU C/C++ compiler is no longer supported as a host compiler on Mac OS X.
make: *** [mcx_core.o] Error 1

@Flamefire
Copy link

@Flamefire, I can't run any test from the runner anyways because it does not have GPU. I need it for building binaries for supported OSes.

I meant: Is it worth building those binaries if they may not be run anymore (by users) as the OS stopped supporting CUDA (or the other way round)?

I just tried the -allow-unsupported-compiler flag, unfortunately it has no effect, nvcc still complains that Apple Clang or GCC are not supported, nvcc 10.2 on macos 12

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#allow-unsupported-compiler-allow-unsupported-compiler

This option has no effect on MacOS.

So there is no solution.

@Seralpa
Copy link

Seralpa commented Aug 11, 2023

@ax3l I think the 11.4.0 for g++ is wrong. It only supports up to 10.X but the table shows compatibility with version 11. From my crt/host_config.h:

#if __GNUC__ > 10

#error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

#endif /* __GNUC__ > 10 */

@ax3l
Copy link
Author

ax3l commented Jan 18, 2024

@Seralpa which CUDA version did you refer to? CUDA 11.4.0?

@Seralpa
Copy link

Seralpa commented Jan 18, 2024

@ax3l yeah, CUDA version 11.4.0

@arrio464
Copy link

arrio464 commented Jul 3, 2024

In some cases, NVCC 11.4 may not work well with GCC-11, idk the specific reasons, but export NVCC_APPEND_FLAGS='-ccbin gcc-10' actually works for me.

❯ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

Error while compling:

/usr/include/stdio.h(195): error: attribute "malloc" does not take arguments

See also: https://forums.developer.nvidia.com/t/cuda-11-5-samples-throw-multiple-error-attribute-malloc-does-not-take-arguments/192750

@66CCFF
Copy link

66CCFF commented Jul 11, 2024

In some cases, NVCC 11.4 may not work well with GCC-11, idk the specific reasons, but export NVCC_APPEND_FLAGS='-ccbin gcc-10' actually works for me.

❯ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

Error while compling:

/usr/include/stdio.h(195): error: attribute "malloc" does not take arguments

See also: https://forums.developer.nvidia.com/t/cuda-11-5-samples-throw-multiple-error-attribute-malloc-does-not-take-arguments/192750

I'm trying to get Cuda 12.1 working with libtorch and gcc-14, and that also works for me. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment