Configuring VFIO Passthrough for KubeVirt GPUs

Follow these steps to configure a Kubernetes node for VFIO passthrough, enabling GPU resources for KubeVirt VMs.

Step 1: Cordon and Drain the Node

To prevent new pods from being scheduled and to safely migrate existing workloads, cordon and drain the node:

kubectl drain {node name} --ignore-daemonsets --delete-emptydir-data --force

Step 2: Label the Node

Add a label to the node indicating it is configured for VFIO passthrough:

kubectl label nodes node-name nautilus.io/vfio=true

Step 3: SSH into the Node

SSH into the node.

Step 4: Record Installed NVIDIA Driver Names

List the installed NVIDIA drivers to note the versions for potential reinstallation later:

dpkg -l | grep nvidia

Step 5: Uninstall All NVIDIA Drivers

Uninstall the NVIDIA drivers from the node:

sudo apt-get remove --purge '^nvidia-.*'

Step 6: Enable IOMMU and Shut Down the Node

Modify the GRUB_CMDLINE_LINUX_DEFAULT line in /etc/default/grub to include iommu=pt amd_iommu=on:

sudo vim /etc/default/grub

Then update GRUB and shut down the node:

sudo update-grub
sudo shutdown now

Step 7: Enable IOMMU in BIOS

Reboot the machine and enter BIOS/UEFI settings. Locate and enable the IOMMU option, usually under Advanced Settings, Chipset, or CPU Configuration. Save the changes and exit the BIOS.

Step 8: Start the Node Back Up

Reboot the node after enabling IOMMU.

Step 9: Verify Driver Removal

Check that the NVIDIA drivers and device plugin are no longer present:

dpkg -l | grep nvidia

Step 10: Install `driverctl`

Install driverctl to manage device bindings:

sudo apt install driverctl -y

Step 11: Bind the GPU to `vfio-pci`

Bind the specific GPU (e.g., 81:00.0) to vfio-pci:

sudo driverctl set-override 0000:81:00.0 vfio-pci

Replace 81:00.0 with the correct PCI address for your GPU.

Step 12: Check if the Devices are Bound to `vfio-pci`

Verify that the GPUs are bound to vfio-pci:

lspci -k -s 81:00.0

For all vfio-pci bound devices:

lspci -nnk | grep -i vfio

Step 13: Verify DaemonSet for GPU Management

After binding the GPUs, check the logs of the nvidia-kubevirt-gpu-dp-daemonset pod:

kubectl logs nvidia-kubevirt-gpu-dp-daemonset-pod-name

Look for lines like:

2024/10/31 03:16:06 Allocated devices map[PCI_RESOURCE_NVIDIA_COM_TU102GL_QUADRO_RTX_6000_8000:0000:81:00.0]

This confirms how KubeVirt manages the GPU.

Step 14: Add the GPU Resource Name to KubeVirt

Edit the KubeVirt configuration to add the GPU resource name:

kubectl edit kubevirt kubevirt

In the permittedHostDevices section, add your GPU entry:

permittedHostDevices:
  pciHostDevices:
    - externalResourceProvider: true
      pciVendorSelector: 10de:2236
      resourceName: nvidia.com/GA102GL_A10
    - externalResourceProvider: true
      pciVendorSelector: 10de:20b5
      resourceName: nvidia.com/GA100_A100_PCIE_80GB
    - externalResourceProvider: true
      pciVendorSelector: 10de:1e30
      resourceName: nvidia.com/TU102GL_QUADRO_RTX_6000_8000

Find the pciVendorSelector values by running:

lspci -nn

Step 15: Restart the DaemonSet Pod

After updating the configuration, restart the NVIDIA DaemonSet pod:

kubectl delete pod -l app=nvidia-kubevirt-gpu-dp -n kubevirt

Conclusion

Your Kubernetes node is now fully configured for VFIO passthrough, enabling GPU resources for KubeVirt VMs. You can test this configuration using one of the KubeVirt virtualization examples, such as Running Virtualization on Windows.

This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.