- Read these docs first to understand the passthrough subject better
We are using the GPU Seabios PCI Express Passthrough
method.
There's no significant difference in GPU performance between SeaBIOS
and OVMF
for GPU passthrough. The choice mainly affects the boot process and compatibility. Once the OS is loaded, GPU performance is similar for both. OVMF
is generally recommended for better compatibility and modern hardware support.
- make sure the vfio kernel modules are loaded
Update /etc/modules
file with the following contents:
Note that after modifying this file you have to run
update-initramfs -u -k all
. Since there are more files requiring the same, you can do it later.
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
- Blacklist the driver completely on the host, ensuring that it is free to bind for passthrough
Update /etc/modprobe.d/blacklist.conf
file with the following contents:
Note that after modifying this file you have to run
update-initramfs -u -k all
. Since there are more files requiring the same, you can do it later.
blacklist amdgpu
blacklist radeon
blacklist nouveau
blacklist nvidia
- Passthrough your NVIDIA GPU
Find your NVIDIA GPU card:
# lspci -nnk
...
81:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)
Subsystem: NVIDIA Corporation TU104GL [Tesla T4] [10de:12a2]
Kernel driver in use: nouveau
Kernel modules: nvidiafb, nouveau
Update /etc/modprobe.d/vfio.conf
file with the following contents:
Note that after modifying this file you have to run
update-initramfs -u -k all
. Since there are more files requiring the same, you can do it later.
## GPU: 81:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)
options vfio-pci ids=10de:1eb8
- Refresh your initramfs
After you are done updating /etc/modules
, /etc/modprobe.d/blacklist.conf
and /etc/modprobe.d/vfio.conf
files, make sure to run the following command to get these updates into your initramfs file:
update-initramfs -u -k all
- Update your VM
First power off the VM!
Go to Proxmox VE Web -> Select your VM
-> Hardware
-> Machine
: change from i440fx
to q35
When changing the machine type from i440fx
to q35
, make sure to rename the NICs.
When using the netplan, update the NIC names in your /etc/netplan/01-netcfg.yaml
file as follows:
ens18 -> enp6s18
ens19 -> enp6s19
It is worth making sure you have the Console access (check the root
or a user creds + sudo
are working) so that you can fix the networking should the NIC names be different.
To pass through the device you need to set the hostpciX
option in the VM configuration, for example by executing:
81:00
is our GPU
qm set <VMID> -hostpci0 81:00,pcie=on
-
Reboot the Proxmox VE host
-
Power on your VM
-
Install the drivers
Now you can install the nvidia drivers in your VM following this doc:
- Verify
The GPU NVIDIA Tesla T4 is now available and is working in the VM:
root@worker-01:~# lspci -s 01:00.0 -nnvk
01:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)
Subsystem: NVIDIA Corporation TU104GL [Tesla T4] [10de:12a2]
Physical Slot: 0
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at e0000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [c8] MSI-X: Enable+ Count=6 Masked-
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
root@worker-01:~# nvidia-smi
Mon May 8 11:08:37 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03 Driver Version: 530.41.03 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 Off| 00000000:01:00.0 Off | 0 |
| N/A 34C P8 13W / 70W| 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
root@worker-01:~#
intel_iommu=on
/amd_iommu=on
There is no need enabling intel_iommu=on
/ amd_iommu=on
on the kernel line where it is enabled by default. Typically Linux kernels >= 5.15
version.
iommu=pt
Using iommu=pt
sets IOMMU to passthrough mode, which can improve performance in certain cases. However, it's generally not recommended for GPU passthrough, as it might reduce isolation and security provided by the IOMMU. Stick to the default settings unless there's a specific reason to use passthrough mode and you understand the implications.
- updating the kernel parameters (aka kernel command line)
For ext4 based systems simply update /etc/default/grub
and run update-grub
.
For ZFS based (rootfs over ZFS) use /etc/kernel/cmdline
and run proxmox-boot-tool refresh
.
Read https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_edit_kernel_cmdline for more
Refs.