Home News Oracle will be bringing 18 zettaFLOPS (or 18 zettabytes) of new AI...

Oracle will be bringing 18 zettaFLOPS (or 18 zettabytes) of new AI compute online by the end of next year

0
Oracle will be bringing 18 zettaFLOPS (or 18 zettabytes) of new AI compute online by the end of next year

Oracle’s Ambitious AI Infrastructure Expansion with Nvidia and AMD

Unveiling Massive AI Compute Power by Mid-2025

Oracle announced a groundbreaking plan to deploy over 18 zettaFLOPS of AI computing capacity by the second quarter of 2025, leveraging cutting-edge hardware from both Nvidia and AMD. This monumental scale of AI infrastructure marks a significant leap in cloud-based AI services.

Nvidia’s Dominant Role: The 800,000 GPU Cluster

Central to Oracle’s AI offering is an enormous cluster featuring 800,000 Nvidia GPUs, delivering a peak performance of 16 zettaFLOPS in sparse F4 precision. This cluster is integrated within Oracle Cloud Infrastructure’s Zettascale10 platform, showcasing Nvidia’s comprehensive solution that includes GPUs, rack systems, and networking.

Oracle’s deployment utilizes Nvidia’s Spectrum X Ethernet switches to interconnect GPUs, creating one of the largest-scale AI clusters built on this networking technology. This infrastructure will also support Nvidia AI Services, which Oracle plans to offer through its cloud platform, enhancing accessibility for enterprise AI workloads.

AMD’s Strategic Deployment: The MI450X and Helios Rack Architecture

While Nvidia leads in sheer GPU count, AMD is set to deploy 50,000 of its MI450X accelerators in Oracle data centers by the latter half of 2025, with further expansions anticipated in subsequent years. The MI450X was first introduced at AMD’s AI-focused conference in June and is designed for rack-scale deployment in a system called Helios.

The Helios rack architecture mirrors Nvidia’s NVL72 design but distinguishes itself by using AMD’s Ultra Accelerator Link, an open alternative to Nvidia’s proprietary NVLink, to connect 72 MI450X GPUs per rack. These racks conform to the Open Rack Wide (ORW) form factor, which supports double-wide configurations considered as a single rack under Open Compute Project (OCP) standards.

Performance and Memory Specifications of AMD’s Helios Racks

AMD estimates that each Helios rack can achieve up to 2.9 exaFLOPS in FP4 precision and 1.4 exaFLOPS in FP8, supported by 31 TB of HBM4 memory delivering 1.4 petabytes per second of bandwidth. Although it remains unclear whether these figures represent dense or sparse FLOPS, the performance is comparable to Nvidia’s upcoming Vera Rubin NVL144 system, with AMD’s offering boasting significantly higher memory bandwidth.

Oracle’s initial deployment of 50,000 MI450X GPUs translates to just over 2 zettaFLOPS in ultra-low precision AI compute, underscoring the scale of their investment in next-generation AI hardware.

Challenges and Market Realities in Harnessing ZettaFLOPS

Despite the impressive numbers, the practical utilization of zettaFLOPS-scale AI compute remains limited. Customers typically prefer higher precision formats like BF16 and FP8 for training and inference, as FP4 is still emerging as a viable storage and inference precision. Leading AI developers, including OpenAI, are gradually adopting these lower precision formats but have yet to fully embrace FP4 for large-scale model training.

Historically, enterprises leasing clusters with tens of thousands of GPUs have favored more mature data types to balance performance and accuracy. While training models natively at FP4 precision is possible, it is not yet mainstream. Oracle’s vast GPU resources are expected to be heavily utilized by major AI players, with OpenAI likely to secure a significant portion of this capacity.

OpenAI’s Expanding Footprint and Strategic Partnerships

OpenAI recently announced the Stargate Project, which includes five new AI data center locations across the United States, signaling rapid expansion. Both Nvidia and AMD have forged investment and deployment agreements with OpenAI, reflecting the company’s pivotal role in driving large-scale AI infrastructure adoption. Oracle stands as OpenAI’s largest cloud partner, positioning itself at the forefront of AI compute provisioning.

Although AMD’s share of the data center GPU market remains smaller compared to Nvidia’s dominance, this gap is expected to narrow. OpenAI has the option to acquire up to 160 million AMD shares at a nominal price under a recent agreement, contingent on deploying six gigawatts of accelerator power, which could significantly boost AMD’s market presence.

Future Outlook: Scaling Towards Gigawatt-Level AI Deployments

Oracle describes the 50,000 MI450X cluster as the initial phase of a gigawatt-scale AI infrastructure rollout. Industry estimates suggest that Oracle could eventually deploy upwards of 180,000 MI450X GPUs, further solidifying its position as a leading AI cloud provider. This scale of deployment underscores the growing demand for specialized AI hardware capable of supporting increasingly complex models and workloads.

Exit mobile version