UALink unveils its first AI interconnect spec – usable in 18 months

The Ultra Accelerator Link Consortium (UALC) has released its first GPU interconnect standard: UALink 1.0.

In May 2024, the Consortium (UAC) – a group of vendors including AMD, AWS Broadcom, Cisco Google HPE Intel Meta, Microsoft and Astera Labs – formed the Consortium to provide an open alternative to Nvidia’s NVLink technology that would allow the creation of networked GPU Clusters to run AI workloads on a large scale.

Members don’t just advance the cause of open standard. Nvidia’s network business generated over $13 billion dollars in revenue during its last financial period and the GPU giant is looking to expand this business. UALink members are interested in creating a cheaper alternative that they can deploy themselves at hyperscale or profit from creating hardware that the rest of us purchase.

The group also believes that the world is ready for an open standard of networking that can be used with GPUs from different vendors, rather than forcing users to create separate networks for each accelerator vendor.

In order to achieve these goals, UAC wants to use the Ethernet networks that most organizations already have.

UALink 1.0, as its name suggests, enables a connection of 200 Gbps (gigabits/second) to an accelerator. It can also quadruple the speed by allowing for four connections to each graphics card.

This specification allows for the creation of compute pods with 1,024 accelerators, and achieves what the consortium describes “the same raw speed as Ethernet with the latency of PCIe switches.”

while only consuming between a third to a half of an Ethernet network.

  • Nvidia chief dismisses Big Tech’s attacks against NVLink network technology
  • Nvidia Vera Rubin CPU and GPU roadmap charts a course for hot, hot, hot 600 kW racks.
  • China’s tech giants provide chips for Ethernet variant tailored to HPC workloads and AI workloads.
  • xAI chose Ethernet over InfiniBand to power its H100 Colossus cluster.
  • Kurtis Bowman, chair and director of UAC, told The Register that the specification heavily draws on AMD’s Infinity Fabric product. He told The Register

    “We were able to build on that [Infinity Fabric],” that he used technology from other UAC members to meet his own needs. Bowman said. He acknowledged that it will take 18 months for compliant hardware to be available, but he believes this is six months less time than what’s typically needed to turn a specification into a product. Bowman believes that HPE, Dell and Lenovo, as well as Broadcom and Synopsys, will adopt the specification and deliver AI-based solutions utilizing it.

    A second specification is already in the works to take advantage of 400G Ethernet as it becomes mainstream. (r)

www.aiobserver.co

More from this stream

Recomended