NVLink/NVSwitch for AI Intermediate
NVLink is NVIDIA's proprietary high-speed GPU-to-GPU interconnect, and NVSwitch is the fabric chip that connects all GPUs within a node into a fully connected topology. Together, they provide bandwidth orders of magnitude higher than PCIe, enabling efficient multi-GPU training without inter-node communication overhead.
NVLink Generations
| Generation | GPU | Bandwidth per Link | Total per GPU |
|---|---|---|---|
| NVLink 3.0 | A100 | 50 GB/s | 600 GB/s (12 links) |
| NVLink 4.0 | H100 | 50 GB/s | 900 GB/s (18 links) |
| NVLink 5.0 | B200 | 50 GB/s | 1800 GB/s (36 links) |
NVSwitch Architecture
NVSwitch provides all-to-all GPU communication within a node:
- DGX A100 — 6 NVSwitches connecting 8 A100 GPUs with 600 GB/s bisection bandwidth
- DGX H100 — 4 NVSwitches connecting 8 H100 GPUs with 900 GB/s per GPU
- Full mesh — Every GPU can communicate with every other GPU at full bandwidth simultaneously
- No hop penalty — Unlike PCIe tree topology, NVSwitch provides uniform bandwidth between any GPU pair
NVLink vs PCIe for AI Training
- Bandwidth — NVLink 4.0 provides 900 GB/s vs PCIe Gen5 at 64 GB/s per direction (14x faster)
- Latency — NVLink has lower latency than PCIe for GPU-to-GPU transfers
- Tensor parallelism — Only practical over NVLink due to fine-grained, frequent communication between GPUs
- Data parallelism — Benefits from NVLink for intra-node all-reduce, uses InfiniBand for inter-node
Monitoring NVLink Health
Bash
# Check NVLink status and throughput nvidia-smi nvlink -s -i 0 # Check NVLink error counters nvidia-smi nvlink -e -i 0 # Run NVLink bandwidth test /usr/local/cuda/samples/bin/p2pBandwidthLatencyTest
Architecture Tip: When designing parallelism strategies, use tensor parallelism within NVLink-connected GPU groups and data/pipeline parallelism across nodes. This maximizes the use of high-bandwidth NVLink for the most communication-intensive operations.
Ready to Learn Network Topology?
The next lesson covers network topology design for AI clusters.
Next: Network Topology →
Lilly Tech Systems