In AI clusters, due to the frequent gradient synchronization and parameter exchange between computing nodes (GPUs), the network has strict requirements for low latency, ultra-high bandwidth, and extremely high heat dissipation efficiency.
The following are the most representative examples of QSFP 200G transceiver selection and their applicable scenarios in AI clusters:
Core recommendation: 200G QSFP-DD (NRZ scheme)
Before the widespread adoption of NVIDIA NDR (400G), 200G InfiniBand (HDR) was the gold standard for AI clusters, and its most revered feature was the QSFP-DD package.
Typical model: Mellanox/NVIDIA compatible 200G QSFP-DD SR8
Reason for recommendation:
- Extremely low latency: QSFP-DD use 8x25G NRZ modulation. Compared to PAM4 modulation, NRZ does not require complex DSP processing and reduces processing latency by about 100ns, which is crucial for the communication efficiency of AI training.
- Low power consumption: Due to the absence of DSP, module power consumption is usually controlled at around 4W, reducing the heat dissipation pressure on large-scale racks.
- Application scenario: Connect NVIDIA A100 GPU servers to HDR InfiniBand switches.
High performance preferred: 200G OSFP (integrated heat dissipation solution)
If your AI cluster density is extremely high or evolving towards 400G/800G, OSFP is the future choice.
Typical model: 200G OSFP SR4 (PAM4)
Reason for recommendation:
- The king of heat dissipation: OSFP 200G transeivers come with built-in heat dissipation fins, which have much higher thermal conductivity efficiency than QSFP. When the GPU is running at full load, the temperature at the back of the cabinet is extremely high, and OSFP can ensure that the optical module does not experience frequency reduction or packet loss due to overheating.
- Forward compatibility: OSFP was originally designed for 800G, and choosing OSFP packaging is beneficial for smooth upgrades to higher bandwidth in the future.
- Application scenarios: high-performance liquid cooled server clusters, next-generation AI computing centers.
Networking instance solution
Scenario A: Direct connection of servers in the cabinet (GPU to ToR switch)
- Solution: 200G QSFP56 DAC Cable (high-speed copper cable)
- Reason: 200GBPS DAC is a zero latency and zero power option within a distance of 1-2 meters. In an AI cluster, reducing every nanosecond of latency can shorten the model training time.
Scenario B: Cross cabinet interconnection (computing nodes to Spine switches)
- Solution: 200G QSFP56 AOC Cable or 200GBASE-SR4 QSFP56 module
- Reason: The distance is between 3-100 meters. 200GBASE SR4 uses 12 fiber MPO jumpers for flexible wiring. Although PAM4 may bring slight latency, it has the highest cost-effectiveness among large-scale Ethernet architectures (RoCE v2).
Conclusion
In AI clusters based on A100 GPU, 200G transceiver modules are typically used in conjunction with IB switches such as QM8790. With the evolution of technology, higher-level AI clusters such as H100/GH200 are gradually upgrading to 400G and 800G transceiver modules.
Yingda has summarized the following solutions for reference:
| AI cluster requirements | Recommended Solution | Core Value |
| Ultra Low Latency (HPC/IB) | QSFP-DD (NRZ) | No DSP delay in the physical layer, fastest synchronization |
| Ultimate heat dissipation (high-density GPU) | OSFP | Comes with heat sink for more stable operation |
| Ultimate cost-effectiveness (Ethernet) | QSFP56 SR4 | Standard universal, moderate cost |
| Short distance interconnection (<2m) | 200Gbps DAC | Dual optimal solution of cost and delay |
| Medium to short distance (3-100m) | 200G AOC | The most cost-effective large-scale Ethernet architecture |