GPU Scheduling
How to manage GPU resources in Kubernetes — install NVIDIA device plugins, request GPUs in Pod specs, use node affinity and taints to direct workloads to GPU nodes.
How GPUs Work in Kubernetes
Kubernetes does not natively understand GPUs. It relies on device plugins to discover and advertise GPU resources on nodes. The NVIDIA device plugin is the most common, making NVIDIA GPUs available as schedulable resources.
The GPU Stack
- Hardware: NVIDIA GPU installed in the node (A100, V100, T4, etc.)
- Driver: NVIDIA GPU driver installed on the host OS
- Container runtime: NVIDIA Container Toolkit (nvidia-docker) enables GPU access inside containers
- Device plugin: NVIDIA device plugin DaemonSet runs on each GPU node and registers GPUs with the kubelet
- Pod spec: Request GPUs using
nvidia.com/gpuin resource requests
Installing the NVIDIA Device Plugin
The device plugin runs as a DaemonSet, ensuring one Pod per GPU node.
# Deploy the NVIDIA device plugin DaemonSet
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
# Verify GPUs are discovered
kubectl get nodes -o json | jq '.items[].status.capacity'
# Look for: "nvidia.com/gpu": "4"
Requesting GPUs in Pod Specs
Request GPUs using the extended resource nvidia.com/gpu in the container's resource section.
# Pod requesting 2 GPUs for training
apiVersion: v1
kind: Pod
metadata:
name: gpu-training
spec:
containers:
- name: trainer
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
command: ["python", "train.py", "--gpus", "2"]
resources:
limits:
nvidia.com/gpu: 2
memory: "32Gi"
cpu: "8"
limits (not requests). Kubernetes automatically sets the request equal to the limit for extended resources. GPU resources are always whole numbers — you cannot request 0.5 GPUs.Node Affinity for GPU Nodes
Node affinity rules ensure Pods are scheduled on nodes with specific characteristics. Use this to direct ML workloads to GPU-equipped nodes.
# Pod with node affinity for GPU nodes
apiVersion: v1
kind: Pod
metadata:
name: gpu-training-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu-type
operator: In
values:
- a100
- v100
containers:
- name: trainer
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
resources:
limits:
nvidia.com/gpu: 1
Node Labels for GPU Management
# Label GPU nodes with GPU type
kubectl label node gpu-node-1 gpu-type=a100
kubectl label node gpu-node-2 gpu-type=v100
kubectl label node gpu-node-3 gpu-type=t4
# Label by GPU memory
kubectl label node gpu-node-1 gpu-memory=80Gi
kubectl label node gpu-node-2 gpu-memory=32Gi
Taints and Tolerations
Taints prevent non-GPU workloads from being scheduled on expensive GPU nodes. Tolerations allow specific Pods to be scheduled on tainted nodes.
# Taint GPU nodes to repel non-GPU workloads
kubectl taint nodes gpu-node-1 nvidia.com/gpu=present:NoSchedule
# Pod with toleration for GPU taint
apiVersion: v1
kind: Pod
metadata:
name: gpu-training-tolerant
spec:
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
value: "present"
effect: "NoSchedule"
containers:
- name: trainer
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
resources:
limits:
nvidia.com/gpu: 1
Resource Quotas for GPUs
Limit GPU usage per namespace to prevent a single team from monopolizing all GPUs.
# GPU quota for the training namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: gpu-quota
namespace: ml-training
spec:
hard:
requests.nvidia.com/gpu: "8"
limits.nvidia.com/gpu: "8"
Monitoring GPU Utilization
- nvidia-smi — Run inside a Pod to check GPU utilization, memory usage, and temperature
- DCGM Exporter — NVIDIA Data Center GPU Manager exports GPU metrics to Prometheus
- kubectl describe node — Shows allocated vs. allocatable GPU resources per node
# Check GPU allocation on a node
kubectl describe node gpu-node-1 | grep -A 5 "Allocated resources"
# Shows: nvidia.com/gpu 2 (50%) / 4
Practice Questions
nvidia.com/gpu: 1 is stuck in Pending state. The cluster has GPU nodes with available GPUs. What is the most likely cause?A) The NVIDIA device plugin DaemonSet is not running on the GPU nodes
B) The Pod is missing a readiness probe
C) The GPU nodes have insufficient CPU
D) The Pod needs a Service to be created first
Show Answer
A) The NVIDIA device plugin DaemonSet is not running on the GPU nodes. Without the device plugin, the kubelet does not know about GPU resources, so nvidia.com/gpu is not advertised as an allocatable resource. The scheduler cannot find a node with available GPUs, leaving the Pod in Pending state. Verify with kubectl describe node and check for nvidia.com/gpu in the allocatable resources.
A) Labels and annotations
B) Taints and tolerations
C) ResourceQuotas and LimitRanges
D) NetworkPolicies and Services
Show Answer
B) Taints and tolerations. Taint the GPU nodes so that only Pods with matching tolerations can be scheduled there. Add the toleration to your ML training Pod specs. This prevents non-ML workloads (web servers, databases, etc.) from consuming GPU node resources.
A) The Pod is scheduled and receives half a GPU
B) The Pod is scheduled and receives one full GPU
C) The Pod fails validation because GPU requests must be whole numbers
D) The Pod is scheduled with GPU time-slicing enabled automatically
Show Answer
C) The Pod fails validation because GPU requests must be whole numbers. Extended resources like nvidia.com/gpu must be requested in whole numbers. You cannot request fractional GPUs through the standard Kubernetes resource model. GPU sharing requires additional configuration like NVIDIA MIG or GPU time-slicing, which are not standard CKA topics.
gpu-type=a100, gpu-type=v100, or gpu-type=t4. Which scheduling feature should you use?A) PriorityClass
B) Pod topology spread constraints
C) Node affinity with requiredDuringSchedulingIgnoredDuringExecution
D) Pod affinity
Show Answer
C) Node affinity with requiredDuringSchedulingIgnoredDuringExecution. Node affinity allows you to constrain which nodes a Pod can be scheduled on based on node labels. Using requiredDuringScheduling makes it a hard requirement — the Pod will only be scheduled on nodes where gpu-type=a100. Pod affinity is for co-locating Pods with other Pods, not for targeting specific node types.
A) kube-scheduler
B) kube-proxy
C) NVIDIA device plugin
D) Container runtime
Show Answer
C) NVIDIA device plugin. The device plugin runs as a DaemonSet on GPU nodes. It discovers the GPUs on the node, registers them with the kubelet via the device plugin framework, and makes them available as extended resources (nvidia.com/gpu). The scheduler then uses this information when placing Pods.