Proxmox GPU Passthrough Guide (NVIDIA Tesla P4)#


1. Proxmox Host Configuration#

Pre-requisites#

  • Ensure PCIe Bifurcation is set to OFF in the server BIOS.

Enable IOMMU#

  1. Open the GRUB configuration:
nano /etc/default/grub
  1. Modify the line GRUB_CMDLINE_LINUX_DEFAULT="quiet" to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
  1. Update GRUB:
update-grub
  1. Reboot the host.

Verify IOMMU Status#

dmesg | grep -e DMAR -e IOMMU

Note: Look for the line: DMAR: IOMMU enabled.


2. VM Configuration (Ubuntu 24.04)#

VM Hardware Settings#

  • Machine: q35
  • BIOS: SeaBIOS
  • Note: Do NOT run apt update or upgrade yet.

Add PCIe Device#

  1. Shut down the VM.
  2. Go to Hardware > Add > PCI Device.
  3. Select Raw Device > Select your GPU.
  4. Ensure All Functions and PCI-Express are checked.

3. Driver Installation in VM#

Verify Hardware Detection#

lspci -v
lspci -nnk | grep -iA2 nvidia

Prepare Build Environment#

Required for version 550+ drivers:

apt install build-essential -y
# Verify version
gcc --version
# Create GCC 13 symlink (Required fix)
ln -s -f /usr/bin/gcc-13 /usr/bin/gcc
reboot

Install NVIDIA Driver 550#

Note: Installation may pause at 81% for several minutes. This is normal.

apt install nvidia-driver-550 -y
reboot

4. Maintenance & Upgrades#

Upgrade Driver from 470 to 550#

apt-get purge nvidia-driver-470 -y
ln -s -f /usr/bin/gcc-13 /usr/bin/gcc
reboot
apt install nvidia-driver-550 -y

Monitoring Tools#

# Check version
cat /proc/driver/nvidia/version
# Standard status
nvidia-smi
# Visual monitor
apt install nvtop
nvtop
# High-frequency watch (0.5s)
watch -n 0.5 nvidia-smi

5. Performance & Reliability#

Enable Persistence Mode#

Eliminates “Cold Start” lag by keeping the driver loaded.

  1. Create the override directory:
mkdir -p /etc/systemd/system/nvidia-persistenced.service.d
  1. Create the override file:
cat <<EOF > /etc/systemd/system/nvidia-persistenced.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --persistence-mode
EOF
  1. Apply settings:
systemctl daemon-reload
systemctl restart nvidia-persistenced
  1. Verify mode is enabled:
nvidia-smi -q | grep "Persistence Mode"

6. Docker GPU Integration#

Install NVIDIA Container Toolkit#

curl -fsSL [https://nvidia.github.io/libnvidia-container/gpgkey](https://nvidia.github.io/libnvidia-container/gpgkey) | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L [https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list](https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list) | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt-get update && apt-get install -y nvidia-container-toolkit

Configure Docker Runtime#

nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker