Nvidia GPU roadmap confirms that Moore’s Law has died and been buried

Comment Jensen Huang, the GPU-slinger’s CEO, is fond of saying that Moore’s Law has died. And at Nvidia’s GTC this month, the GPU-slinger’s top exec revealed just how far behind the law is.

Huang revealed on stage not only the next-gen Blackwell Ultra processorsbut also a surprising amount about its two next generations of accelerated computer platforms, including a 600kW system with 576 GPUs. We also learned that a new GPU family due to arrive in the year 2028 will be named after Richard Feynman. You’ve got to be kidding!

It is not uncommon for chipmakers tease their roadmaps, but we don’t usually get so much information at once. Nvidia is stuck. It has hit not one but several roadblocks. Nvidia can’t do much about them, except throw money at them.

These problems won’t surprise anyone who pays attention. Distributed computing is a game of bottleneck-whack-a mole, and AI may be the ultimate mole-hunt.

From here, it’s all uphill

One of the most obvious challenges is scaling computing.

In recent years, advances in process technology have slowed down to a crawl. Although there are still knobs that can be turned, they are becoming exponentially more difficult to move. Nvidia’s solution to these limitations is simple. They will increase the amount of silicon per compute node. Nvidia’s most dense systems, or racks, mesh up to 72 GPUs in a single computing domain using the NVLink fabric, which is a high-speed 1.8TB/s fabric. Eight or more racks can be stitched together with InfiniBand and Ethernet to achieve the desired memory and compute capacity.

Nvidia revealed at GTC that it plans to increase this to 144, and eventually 576 GPUs. Scaling up is not limited to racks, but also happens on the chip package.

It became apparent with the launch ofNvidia’s Blackwell Accelerators a year earlier. The chips claimed a 5x performance boost over Hopper. This sounded great, until you realized that it required twice the die count and a new 4-bit format, as well as 500 watts of additional power.

In reality, Nvidia’s top-specced Blackwell Die are only 1.25x faster than a GH100, with 1,250 dense teraFLOPS versus the 989 of a GH100 — and there were two of them.

Nvidia CEO Jensen Huang anticipates that by 2027, racks will surge to 600kW, with the debut the Rubin Ultra NvL576 — Click to enlarge.

While we don’t know which process technology Nvidia will use for its next-gen chip, we do know that Rubin Ultra is going to continue this trend and go from two reticle-limited dies to four. Huang expects TSMC’s 2nm to have a 20 percent increase in efficiency. That will still be a hot package.

Not only is it compute, but also memory. You might have noticed that the capacity and bandwidth of Rubin Ultra is much higher than Rubin — 288GB per package instead of 1TB. The memory modules are faster and have a higher capacity. However, the other half is due to the doubled amount of silicon dedicated for memory, from eight modules in Blackwell and Rubin, to 16 modules in Rubin Ultra.

Nvidia’s higher capacity means they can fit more model parameters into a single package, or 500 billion per “GPU” because they now count individual dies instead of sockets. HBM4e is also expected to double the memory bandwidth compared to HBM3e. The bandwidth is expected to increase from the current 4TB/s per Blackwell to 8TB/s with Rubin Ultra.

Unfortunately it’s unlikely that future Nvidia graphics cards will pack in more silicon unless there is a major breakthrough with process technology.

Process improvements aren’t necessary to scale memory or compute. In general, dropping precision from 16-bits to 8-bits can double throughput and reduce memory requirements. Nvidia’s performance gains are being boosted by reducing the number of bits. Nvidia claimed a 5x increase in floating point performance after dropping four bits from Hopper to Blackwell.

However, below four-bit precision the LLM inference is a bit rougher, with perplexity scores rapidly increasing. There’s also some interesting research going on around super low precision quantumization, as low 1.58 bits while still maintaining accuracy.

It’s not that reducing precision is the only way to gain FLOPS. You can also devote less die area to high precision datatypes, which are not needed by AI workloads.

This is what we saw with Blackwell Ultra. Ian Buck, the VP of Accelerated computing business unit at Nvidia told us that they actually nerfed their chip’s double-precision (FP64) performance in exchange for more 4-bit FLOPS.

Whether or not this is a sign of FP64 being phased out at Nvidia is yet to be determined, but if double-precision grunt is important to you, AMD’s APUs and GPUs should be on your list.

Nvidia’s future is clear. Its compute platforms will only get bigger, more dense, hotter, and power hungry. In a press Q&A conducted by a calorie-deprived Huang last week, he said that the practical limit of a rack was how much power it could handle.

“A datacenter is now 250 megawatts. That’s kind of the limit per rack. I think the rest of it is just details,” Huang said. “If you said that a datacenter is a gigawatt, and I would say a gigawatt per rack sounds like a good limit.”

There’s no escaping the power issue

600kW racks are a headache for datacenter operators.

It’s important to note that the problem of chilling megawatts ultra-dense computing is not a new one. Cray, Eviden and Lenovo have been working on this problem for years. What has changed is that we’re no longer talking about a few boutique compute clusters per year. We’re talking about dozens of clusters. Some of them are so large that they could dethrone some of the most powerful supers in the Top500 if Linpack was able to tie up 200,000 Hopper graphics cards.

These scales are too large for low-volume, highly-specialized thermal management and power systems. The datacenter vendors, who sell the less-than-stellar bits and pieces you need to make the multimillion dollar racks work, are only just now catching up.

This is probably why so many Blackwell deployments have been announced for the air-cooled HGXB200 and not the NVL72 Huang keeps hyping. These eight GPU-based HGX systems are compatible with many H100 environments. Nvidia racks have been 30-40kW for years. Jumping to 60kW isn’t a big leap, but it is possible to drop down to just two or three servers in a rack.

Here are those ‘AI factories,’ Huang is always mentioning

the NVL72, a rackscale design heavily inspired by the hyperscalers, with DC bus bars and power sleds out the front. With 120kW of liquid-cooled compute, it becomes difficult to deploy more than a couple of these in existing facilities. This will only get more difficult when Nvidia’s monster racks of 600kW make their debut late in 2027.

This is where those “AI factories” Huang keeps rattling on about come into play — purpose built datacenters designed in collaboration with partners like Schneider Electric to cope with the power and thermal demands of AI.

And, surprise surprise, just a week after revealing its GPU roadmap for the coming three years, Schneider announced an expansion of $700 million in the US, to boost production.

Ofcourse, the infrastructure needed to power and cool ultra dense systems isn’t the only issue. Nvidia has little control over the power that is delivered to the datacenter.

Every time Meta, Oracle or Microsoft announces a new AI bit barn, they usually follow up with a lucrative power purchase agreement. Meta’s mega DC, which will be born in the bayou was announced alongside a 2.2GW power plant. So much for sustainability and carbon neutrality.

As much as we’d like to see nuclear power make a comeback in the future, it’s difficult to take small modular plants seriously when even the most optimistic predictions place deployments in the 2030s.

  • A closer glance at Dynamo, Nvidia’s ‘operating system’ for AI inference.
  • Microsoft walking out of datacenter leases isn’t a sign that the AI bubble has burst.
  • Schneider Electric pumps $800M into US operations as AI datacenter demands surge.
  • Nvidia’s Vera Rubin GPU, CPU roadmap charts a course for hot, hot, hot 600 kW racks AMD, Intel and other cloud providers and chip designers vying to take a piece of Nvidia’s market share will face these same challenges in the near future. Nvidia is one of the first companies to face these challenges. While this has its drawbacks, Nvidia is in a unique position to influence the future of datacenter thermal and power designs.

    We have already mentioned that Huang’s willingness to reveal the next three generations of GPU technology and tease the fourth was to ensure that its infrastructure partners were ready to support them once they arrived.

    “The reason why I communicated to the world what Nvidia’s next three, four year roadmap is now everybody else can plan,” Huang said.

    On a flip side, these efforts serve to clear the path for competing chipmakers. If Nvidia creates a 120kW or 600kW and rack and colocation operators and cloud operators will support it, AMD or Intel can now pack the same amount of compute into their own Rack-scale platforms, without worrying about where customers will put them. (r)

www.aiobserver.co

More from this stream

Recomended