Skip to content
All insights
TECHNICAL

Quantum-X800 versus Spectrum-X: InfiniBand or Ethernet for GPU clusters.

Apr 11, 2026 | 9 min read | Rillor Research
QUANTUM-X800

The fabric you wrap around a GPU cluster is not a downstream decision. It is set the moment you pick the system, because the network interface ships soldered or socketed onto the OEM baseboard, and the switch tier you can fully drive is dictated by which NIC generation came with that platform. By the time you are contracting an HGX B200 NVL8 system or a GB300 rack, the InfiniBand-or-Ethernet question has already been mostly answered by the bill of materials. So architects need to make the call deliberately, up front, before the contract is executed.

This is a comparison of NVIDIA's two cluster-fabric paths for large training and inference deployments: Quantum-X800 InfiniBand against Spectrum-X Ethernet. Both were built for the same job, moving collective-communication traffic across thousands of accelerators at 800 Gb/s, and both were announced as end-to-end 800G platforms. They differ in how they route around congestion, what they offload into the switch, and what it costs to staff and operate them. We will walk the switch silicon, the NIC pairing, the congestion-control models, and the operational fit, then map all of it back to the OEM system you actually sign for.

The switch silicon

Start with the boxes that move the packets, because the rest of the decision hangs off their generation.

On the InfiniBand side, Quantum-X800 is the current 800G XDR generation. The platform provides 144 ports of 800 Gb/s connectivity, and it adds hardware in-network compute through SHARP v4 along with adaptive routing and telemetry-based congestion control. The generation it replaces is Quantum-2, whose QM9700 switch is a 1U 400G NDR box: 64 ports of 400 Gb/s on 32 OSFP connectors, 51.2 Tb/s of aggregate bidirectional throughput, roughly 66.5 billion packets per second, with RDMA, adaptive routing, and SHARP already present at the prior tier. If you are reading a QM9700 spec sheet today, you are reading the previous generation. That matters for forward planning: a cluster you contract for delivery later this year should be specified against XDR, not NDR, unless there is a deliberate reason to stay on 400G.

On the Ethernet side, Spectrum-X is anchored by the SN5600, a Spectrum-4-based 2U switch with 64 ports of 800GbE on OSFP and 51.2 Tb/s of throughput. It is a standards-based Ethernet switch with Spectrum-X extensions layered on top to make RDMA over Converged Ethernet (RoCE) behave like a lossless fabric rather than best-effort Ethernet. The comparison is not InfiniBand versus generic datacenter Ethernet. It is InfiniBand versus an Ethernet fabric that NVIDIA has specifically tuned for AI collectives.

144 x 800G
Quantum-X800 XDR port count
64 x 800GbE
SN5600 Spectrum-4 ports
51.2 Tb/s
SN5600 switch throughput

The NIC pairing decides what you can drive

A switch is only half the fabric. The NIC at the host end of every link sets the line rate you can actually achieve, and on these platforms the NIC is part of the system you contract, not a part you bolt on afterward.

The ConnectX-8 SuperNIC is the 800G adapter built for this generation. It is a PCIe Gen6 device, 48 lanes of PCIe Gen6 with a built-in PCIe switch, and it serves both fabrics: the same silicon can sit on a Quantum-X800 InfiniBand network or a Spectrum-X Ethernet network. It doubles the 400G data rate of the prior ConnectX-7 generation. ConnectX-7, by contrast, is a 400G NDR InfiniBand or 400GbE adapter, and it is exactly the part you find integrated into branded servers today, sold by OEMs such as Lenovo as ThinkSystem options in both air-cooled and direct-water-cooled variants.

The practical rule is simple. A system built around ConnectX-7 tops out at 400G per port, which means a Quantum-2 QM9700 NDR fabric or 400GbE. A system built around ConnectX-8 can saturate the 800G tier, Quantum-X800 XDR or the SN5600 Spectrum-X path. You do not mix and match freely after the fact, because the NIC is provisioned on the baseboard with the GPUs. If you want 800G end to end, you specify a ConnectX-8 platform from the start.

Layer400G generation800G generation
InfiniBand switchQuantum-2 QM9700 (NDR)Quantum-X800 (XDR)
Ethernet switchSpectrum-3 / earlierSN5600 (Spectrum-4)
Host NICConnectX-7ConnectX-8 SuperNIC
Per-port rate400 Gb/s800 Gb/s
InterconnectPCIe Gen5 classPCIe Gen6

This is also why fabric is a contracting decision rather than a deployment detail. When you select a Supermicro SYS-A22GA-NBRT, a Gigabyte G894-AD1-AAX5, a Dell PowerEdge XE9780, an HPE ProLiant Compute XD685, or a Lenovo SR680a V3, you are also selecting the NIC tier that ships on it, and therefore the ceiling of the fabric you can build. The rack-scale interconnect picture, including how ConnectX generations and NVLink5 fit together, is laid out in GB200 NVL72 versus GB300 NVL72 at rack scale.

How each fabric handles congestion

The most consequential technical difference between the two paths is what happens when many GPUs try to talk at once, which during a training run is most of the time.

InfiniBand: deterministic by design

InfiniBand is a credit-based, lossless transport from the ground up. The fabric does not drop packets to signal congestion; flow control prevents a sender from transmitting until the receiver has buffer credit. Layered on top, Quantum adaptive routing spreads traffic across available paths to avoid hot links, and telemetry-based congestion control reacts to building pressure. The result is low jitter and predictable tail latency, which is the property that matters for synchronous collectives, where the whole job waits on the slowest link in an all-reduce.

The other InfiniBand-specific lever is in-network compute. SHARP performs reductions inside the switch fabric rather than only at the endpoints, so an all-reduce can be partially executed as data passes through the network. With SHARP v4, NVIDIA cites a roughly 9x increase to about 14.4 TFLOPS of in-network compute versus the prior generation. For large, communication-bound training, moving part of the collective into the switch reduces the volume that has to traverse host links and shortens the critical path.

Ethernet: engineered to behave like InfiniBand

Spectrum-X does not get losslessness for free, so NVIDIA engineered it in. Congestion control on Spectrum-X is an end-to-end design: the Spectrum-4 switches supply real-time telemetry, and the ConnectX or BlueField SuperNIC consumes that telemetry to manage how fast each sender injects traffic. NVIDIA describes this loop as processing millions of congestion-control events per second with microsecond-level response, paired with RoCE adaptive routing that balances data paths to prevent flow collisions. The intent is to give standard Ethernet the lossless, low-tail-latency behavior that AI collectives need, without abandoning the Ethernet ecosystem.

The honest framing for an architect: InfiniBand delivers determinism as an intrinsic property of the transport, while Spectrum-X delivers comparable behavior through a tightly coupled switch-and-SuperNIC control loop. Both work at scale. The InfiniBand version has fewer moving parts to reason about for the latency-critical path. The Ethernet version keeps you inside a fabric your network team already understands at the cable, the optics, and the monitoring layer.

In-network compute and ecosystem, weighed against each other

Reduce the two fabrics to their genuine differentiators and the choice gets clearer.

InfiniBand's case rests on two things. First, in-network compute through SHARP, which is a real architectural advantage for communication-bound training at scale and has no exact Ethernet equivalent. Second, deterministic latency from a transport that was lossless from inception. If your workload is large-scale synchronous training where collective performance is the bottleneck, these are not marginal. They are the reason the largest training clusters have historically run InfiniBand.

Ethernet's case rests on breadth. It is the fabric the rest of the datacenter already speaks. Optics, transceivers, monitoring tooling, and operational runbooks are commodity. There is a deep pool of engineers who operate Ethernet daily and a far shallower pool who operate InfiniBand subnet managers and fabric diagnostics. When a cluster has to coexist with storage networks, north-south traffic, and a multi-tenant control plane, an Ethernet fabric folds into the existing operational model instead of standing up a parallel one.

Both paths reach 800G end to end with the same NIC family, the ConnectX-8 SuperNIC, and the same broad set of system-vendor partners ship both. The X800 platform partner list includes Aivres, Dell Technologies, Hewlett Packard Enterprise, Lenovo, and Supermicro, so the OEM you prefer will offer you a system on either fabric. The choice is not constrained by who you buy from. It is constrained by your workload profile and your operations team.

Operational cost and staffing

The total cost of a fabric is not the switch capex. It is the switch capex plus the people who run it, and that second term is where the two paths diverge most for an organization that is not already an InfiniBand shop.

InfiniBand operations require specific expertise: subnet manager configuration, fabric topology and routing verification, and a diagnostic toolchain distinct from anything in the Ethernet world. That expertise commands a premium and is harder to hire. The payoff is a fabric purpose-built for the workload and, at the largest scale, fewer surprises in collective performance. Ethernet operations draw on a far larger labor pool and reuse the tooling, alerting, and on-call structure a team already runs for the rest of its network. Spectrum-X narrows the technical gap to InfiniBand while keeping the operational model familiar, which is precisely its strategic point.

A defensible heuristic: if you are running at the scale where SHARP and deterministic tail latency move the needle on training throughput, and you have or can hire InfiniBand expertise, Quantum-X800 is the stronger fabric. If you are at meaningful but not frontier scale, want to keep one operational model across the datacenter, or are staffing-constrained, Spectrum-X gets you most of the way with an Ethernet team. Neither is a downgrade. They are different points on the determinism-versus-breadth curve.

Mapping the fabric to the system you contract

This is the step architects most often defer and should not, because on Rillor the fabric is part of the asset you are buying forward.

Every standardized forward contract on a complete OEM GPU system carries the NIC tier in its specification. A contract on an HGX B200 NVL8 platform from Supermicro, Gigabyte, Dell, HPE, Lenovo, Aivres, or ASRock Rack specifies whether it ships with ConnectX-7 or ConnectX-8, and that single line determines whether you can stand up an 800G Quantum-X800 or Spectrum-X fabric or are capped at 400G. The CPU head node (Intel Xeon 6980P or AMD EPYC Turin), the BlueField-3 DPU, and the NIC generation are all part of the system of record, captured in the contract and at delivery. You are not buying GPUs and sorting out networking later. You are buying a fully specified platform, fabric ceiling included.

That is why we treat fabric as a procurement decision. When you browse systems on the marketplace or read the per-SKU pages in the catalog, the NIC and interconnect specification is part of what you are pricing. Get the fabric decision right before execution and the cluster comes up the way it was designed. Get it wrong and you have contracted a 400G platform for an 800G ambition, a mismatch that is expensive to unwind once systems are on the floor.

For buyers planning a multi-rack buildout, the fabric choice also interacts with how you commit capacity over time, which is part of the broader procurement picture in read this if you're procuring B200 systems in 2026. And because the network tier is one more dimension where a single vendor can lock you in, it is worth reading alongside how Rillor works for buyers, where sourcing across multiple OEMs on transparent terms is the point. The forward market exists so that these decisions are made on transparent terms, with verified counterparties and independent escrow, rather than negotiated one rack at a time.

The bottom line for architects

Quantum-X800 and Spectrum-X are both 800G fabrics built for the same accelerators, served by the same ConnectX-8 SuperNIC, and offered by the same OEMs. The decision comes down to two questions. Does your workload need the in-network compute and intrinsic determinism of InfiniBand badly enough to justify dedicated fabric expertise? Or does the breadth and operational familiarity of a tuned Ethernet fabric fit your team and your datacenter better? Answer those, confirm the NIC tier on the OEM system you intend to contract, and the rest of the architecture follows.

Whichever way you go, lock the specification into the contract before you build, not after.

GET ACCESS

Trade the forward curve on Rillor.

Rillor is invite only. Verified buyers and sellers transact standardized forward contracts on OEM GPU systems, with physical delivery and independent escrow on every contract.

Become a Partner
Sources & further reading
GET ACCESS

Trade the forward curve on Rillor.

Rillor is invite only. Verified buyers and sellers transact standardized forward contracts on OEM GPU systems, with physical delivery and independent escrow on every contract.

Become a Partner
NEWSLETTER

Get Rillor market reports in your inbox.

Allocation signals, forward-curve commentary, and product updates. No filler.