InfiniBand for AI Intermediate

InfiniBand is the dominant networking technology for AI training clusters, offering higher bandwidth and lower latency than Ethernet. This lesson covers InfiniBand architecture, speed generations (HDR, NDR, XDR), subnet management, and practical deployment considerations for AI infrastructure.

InfiniBand Speed Generations

Generation	Per-Lane Speed	4x Port Speed	Typical AI Use
HDR	50 Gbps	200 Gbps	A100 clusters
NDR	100 Gbps	400 Gbps	H100 clusters
XDR	200 Gbps	800 Gbps	Next-gen clusters

InfiniBand Architecture

Host Channel Adapter (HCA) — Network interface card installed in each GPU server (e.g., ConnectX-7)
InfiniBand switches — High-radix switches (Quantum-2 with 64 NDR ports) forming the fabric
Subnet Manager (SM) — Software that manages routing, discovers topology, and handles failover
Cables — Copper (up to 2m) or active optical cables (up to 100m) connecting nodes to switches

InfiniBand vs Ethernet for AI

Aspect	InfiniBand NDR	RoCE (RDMA over Ethernet)
Bandwidth	400 Gbps	400 Gbps (800GbE emerging)
Latency	~0.5 microseconds	~1-2 microseconds
Congestion control	Credit-based (lossless)	PFC/ECN (complex to tune)
Cost	Higher per port	Lower, uses existing Ethernet infrastructure
Ecosystem	HPC-focused	Broader enterprise compatibility

Deployment Considerations

Dual-rail networking — Use two InfiniBand HCAs per node for redundancy and doubled bandwidth
NUMA awareness — Connect each HCA to the same NUMA node as its associated GPUs for optimal DMA performance
Subnet Manager placement — Run standby SMs on multiple switches for high availability

Pro Tip: When building a new AI cluster, always choose the latest InfiniBand generation your budget allows. The bandwidth improvement directly translates to better distributed training scaling, and the cost difference is small relative to the GPU investment.

Ready to Learn RDMA?

The next lesson covers RDMA technology that enables zero-copy data transfers over InfiniBand and Ethernet.

Next: RDMA →

← Introduction RDMA →