InfiniBand for AI Intermediate

InfiniBand is the dominant networking technology for AI training clusters, offering higher bandwidth and lower latency than Ethernet. This lesson covers InfiniBand architecture, speed generations (HDR, NDR, XDR), subnet management, and practical deployment considerations for AI infrastructure.

InfiniBand Speed Generations

GenerationPer-Lane Speed4x Port SpeedTypical AI Use
HDR50 Gbps200 GbpsA100 clusters
NDR100 Gbps400 GbpsH100 clusters
XDR200 Gbps800 GbpsNext-gen clusters

InfiniBand Architecture

  • Host Channel Adapter (HCA) — Network interface card installed in each GPU server (e.g., ConnectX-7)
  • InfiniBand switches — High-radix switches (Quantum-2 with 64 NDR ports) forming the fabric
  • Subnet Manager (SM) — Software that manages routing, discovers topology, and handles failover
  • Cables — Copper (up to 2m) or active optical cables (up to 100m) connecting nodes to switches

InfiniBand vs Ethernet for AI

AspectInfiniBand NDRRoCE (RDMA over Ethernet)
Bandwidth400 Gbps400 Gbps (800GbE emerging)
Latency~0.5 microseconds~1-2 microseconds
Congestion controlCredit-based (lossless)PFC/ECN (complex to tune)
CostHigher per portLower, uses existing Ethernet infrastructure
EcosystemHPC-focusedBroader enterprise compatibility

Deployment Considerations

  • Dual-rail networking — Use two InfiniBand HCAs per node for redundancy and doubled bandwidth
  • NUMA awareness — Connect each HCA to the same NUMA node as its associated GPUs for optimal DMA performance
  • Subnet Manager placement — Run standby SMs on multiple switches for high availability
Pro Tip: When building a new AI cluster, always choose the latest InfiniBand generation your budget allows. The bandwidth improvement directly translates to better distributed training scaling, and the cost difference is small relative to the GPU investment.

Ready to Learn RDMA?

The next lesson covers RDMA technology that enables zero-copy data transfers over InfiniBand and Ethernet.

Next: RDMA →