Broadcom’s Tomahawk Ultra asks: Who needs UALink with Ethernet?

While chip vendors like AMD are closing the gap on Nvidia in terms of GPU FLOPS and memory bandwidth, their ability scale performance is still limited.

These technologies allowed Nvidia build rack-scale systems that have 72 GPUs while Intel and AMD remain stuck at eight. Many in the industry have thrown support behind the emerging Ultra Accelerator Link protocol (UALink), a free alternative to Nvidia’s NVLink.

However, not everyone is in agreement that a new protocol or hardware is needed. Broadcom, a former founding member of UALink, now believes Ethernet can do the job faster. Pete Del Vecchio is the product line manager of Broadcom’s Tomahawk range. He spoke to El Reg. Broadcom has not yet returned its UALink membership cards. Del Vecchio will not rule out the possibility that Broadcom could switch to UALink in the future. As things stand, however, it is not on the horizon, he said.

“Our position is you don’t need to have some spec that’s under development that maybe you’ll have a chip a couple of years from now,” Del Vecchio Said.

Broadcom, on the other hand, is pushing forward with a competing technology, which it calls scale-up Ethernet or SUE. Broadcom claims that the technology will support scale-up system with at least 1,024 accelerations using any Ethernet platforms. Nvidia’s NVLink switch technology can support up to 576 accelerators. However, we are not aware of any deployments beyond 72 GPU sockets.

Tomahawk Ultra.

Broadcom’s headline silicon for SUE, the newly announced Tomahawk Ultra is a 51.2 Tbps Switch ASIC that has been specifically tuned to compete against Nvidia’s InfiniBand, in traditional supercomputers, HPC clusters as well as NVLink, in rack-scale deployments akin Nvidia’s GB200 NVL72, or AMD’s Helios. While Tomahawk Ultra shares the same package with Broadcom’s Tomahawk 5(TH5) and is pin compatible, it’s a completely different silicon underneath.

The chipset is tuned for high-performance networks, and has a large radix that includes 512 x100 Gbps serializers deserializers. It will reportedly deliver latency of as little as 250 nanoseconds when pushing 64-byte packets around 77 billion times per second.

The smaller packets used in HPC systems can be problematic for networking equipment that is not equipped to handle the higher message rates. Tomahawk Ultra overcomes this problem in part by implementing a customized Ethernet header which allows for larger payloads when dealing with smaller packages.

This chip also includes a full complement congestion control mechanisms including forward error correction, credit-based flow controls and packet loss mitigation while maintaining compatibility to existing Ethernet NICs.

This switch supports in-network collective (SHARP) which Nvidia uses in its NVLink switches. It allows operations such as all-reduce operations to be offloaded to the network. This improves network efficiency by reducing bandwidth requirements.

In terms of scale-up switch designs, Tomahawk Ultra provides just under twice as much bandwidth compared to Nvidia’s fifth-gen NVLink Switches at 51.2 Tbps versus 28,8 Tbps. Broadcom can support a scale-up system with 128 accelerators using the same number switches as Nvidia’s NVL 72-GPU systems. Del Vecchio, a representative of UALink, claims that Tomahawk Ultra offers a better latency. However, it is difficult to evaluate this claim until the first hardware ships. Kutis bowman, director of architecture at AMD and chairman for the UALink Consortium recently toldOur sibling site the Next Platformthat the consortium expects switch latencies in the 100-150 ns area, which, if the consortium can pull it off could give the protocol an advantage in certain applications.

  • Amazon Graviton 4 EC2 instance packs dual 300 Gbps NICs.
  • Broadcom quietly plots to take over the AI infrastructure market.
  • In fact, the network is trying to become a computer.
  • Omni-Path has returned to the AI and HPC menu as a new challenger to Nvidia’s InfiniBand.

We’ll just have to wait to We shouldn’t be waiting long. Broadcom claims that Tomahawk Ultra ASICs have already been shipped to customers. Since they are pin-compatible with the TH5, repurposing existing switch chassis should be fairly straightforward.

The best of both worlds?

Ofcourse, just because UALink hasn’t been released on the market does not mean that the protocol is beyond AMD or Intel’s reach. In April, the UALink Consortium published its first spec. At its Advancing AI conference in June, AMD announced their Helios rack system, which will use UALink and Ethernet as its scale-up fabrics.

Yes, AMD is tunneling UALink over Ethernet switches for its first rack scale systems. This means AMD will begin working out any potential gremlins within the v1.0 spec, while its network partners still bring their first UALink hardware to market. Robin Grindley is a principal product manager at Broadcom.

However tunneling UALink through Ethernet isn’t ideal. You’ll never be able to achieve UALink’s target of 100-150 ns. You can’t deliver what you don’t have. If AMD waited until the year 2027 to release its Helios rack, it would have to compete with Nvidia’s 600 kW, 120-GPU socket Kyber systems. (r)

www.aiobserver.co

More from this stream

Recomended