NVIDIA's Spectrum-X Ethernet fabric, now shipping with Multi-Rate Caching (MRC), is becoming the de facto backbone for gigascale AI clusters. It slashes tail latency by 30% while preserving full line-rate throughput, letting hyperscalers and cloud builders run distributed training jobs across 32,000 GPUs without the jitter that cripples InfiniBand alternatives. The open, AI-native stack is already live in Microsoft Azure and Oracle Cloud, setting a new bar for what “good enough” networking looks like in the trillion-parameter era.
What MRC does
MRC is an RDMA transport protocol that distributes a single RDMA connection across multiple network paths. Instead of a single-lane road, it creates a street grid with real-time traffic rerouting. This improves throughput, load balancing, and availability for large-scale AI training fabrics. OpenAI, Microsoft, and Oracle have deployed MRC in production. OpenAI's Sachin Katti stated that MRC's end-to-end approach avoided typical network-related slowdowns and interruptions, maintaining the efficiency of frontier training runs at scale.
How it works
MRC delivers high GPU utilization by load-balancing traffic across all available paths, ensuring every GPU gets the bandwidth it needs throughout a training run. It sustains high bandwidth even under congestion by dynamically avoiding overloaded paths in real time. When data loss occurs, intelligent retransmission enables rapid, precise recovery, minimizing the impact of short-lived interruptions to long-running jobs and helping avoid GPU idle time. Administrators gain fine-grained visibility and control over traffic paths, simplifying operations and accelerating troubleshooting at scale.
Failure bypass and multiplane designs
MRC's failure bypass technology detects a network path failure in microseconds and reroutes traffic automatically in hardware. This matters for AI training clusters where thousands of GPUs must stay synchronized, as even a brief network disruption can slow or interrupt an entire training job. Spectrum-X Ethernet prevents that by responding at hardware speed.
Another innovation is multiplanar network designs, which OpenAI deploys with Spectrum-X Ethernet in conjunction with MRC. A multiplane network consists of multiple independent network fabrics, or planes, each providing an alternate communication path between GPUs. The NVIDIA Spectrum-X Multiplane capability supports hardware-accelerated load balancing across the planes, boosting resiliency and scale without sacrificing performance. This keeps latencies predictably low while scaling to hundreds of thousands of GPUs.
Open standard and ecosystem
MRC was first proven in production on NVIDIA Spectrum-X Ethernet hardware and has now been released as an open specification through the Open Compute Project. NVIDIA collaborated on MRC development with AMD, Broadcom, Intel, Microsoft, and OpenAI. Customers can choose between Spectrum-X Ethernet Adaptive RDMA and MRC protocols, as well as other custom protocols, all running natively across NVIDIA ConnectX SuperNICs and Spectrum-X Ethernet switches.
Bottom line
Spectrum-X Ethernet with MRC is the network fabric that lets hyperscalers build AI factories at gigascale without the jitter and downtime that plague alternative approaches. It's open, resilient, and already in production at the largest AI training clusters in the world.