Intermediate
Multi-Instance GPU (MIG)
Partition NVIDIA A100 and H100 GPUs into hardware-isolated instances with guaranteed compute, memory bandwidth, and fault isolation for multi-tenant Kubernetes clusters.
Understanding MIG
Multi-Instance GPU (MIG) is a hardware-level feature available on NVIDIA A100, A30, and H100 GPUs that allows a single GPU to be partitioned into up to seven independent instances. Each instance has dedicated compute units, memory, and memory bandwidth — providing true hardware isolation.
Hardware isolation: Unlike time-slicing, MIG provides guaranteed resources. A workload running on a MIG instance cannot be affected by other workloads on the same GPU — no memory contention, no compute interference, and faults are isolated.
MIG Partition Profiles
The A100 80GB supports these MIG profiles:
| Profile | GPU Memory | Compute (SMs) | Max Instances |
|---|---|---|---|
| 1g.10gb | 10 GB | 1/7 | 7 |
| 2g.20gb | 20 GB | 2/7 | 3 |
| 3g.40gb | 40 GB | 3/7 | 2 |
| 4g.40gb | 40 GB | 4/7 | 1 |
| 7g.80gb | 80 GB | 7/7 | 1 |
Enabling MIG in Kubernetes
Configure MIG using the GPU Operator's MIG manager:
# Enable MIG mode on the GPU (requires node drain)
sudo nvidia-smi -i 0 -mig 1
# Create MIG instances (example: 3 x 2g.20gb + 1 x 1g.10gb)
sudo nvidia-smi mig -i 0 -cgi 9,9,9,19 -C
# Verify MIG instances
nvidia-smi mig -lgi
Kubernetes MIG Strategies
The NVIDIA device plugin supports three MIG strategies:
- none: MIG is disabled. GPUs are exposed as whole devices.
- single: All MIG instances on a GPU must be the same profile. Resources are named
nvidia.com/gpu. - mixed: Different MIG profiles can coexist. Resources are named by profile, e.g.,
nvidia.com/mig-2g.20gb.
# Pod requesting a specific MIG profile
apiVersion: v1
kind: Pod
metadata:
name: inference-small
spec:
containers:
- name: model
image: nvcr.io/nvidia/pytorch:24.01-py3
resources:
limits:
nvidia.com/mig-1g.10gb: 1 # Request a 1g.10gb MIG slice
Production tip: Use the
mixed MIG strategy to run different workload sizes on the same GPU. Allocate larger profiles (3g.40gb) for training and smaller profiles (1g.10gb) for inference, maximizing GPU utilization across workload types.
Lilly Tech Systems