Intel has officially missed out on AI in the datacenter (19459000)
Comment: On Thursday, Intel’s hopes of competing with rivals Nvidia or AMD for a share of the AI accelerator market were dashed as yet another GPU was scrapped.
Falcon Shores will never leave Intel’s labs. This was revealed by interim co-CEO Michelle Johnston Holthaus on Thursday’s Q4 earnings conference call to analysts. “We plan to leverage Falcon shores as an internal test chip only, without bringing it to market.”
This decision means that Intel will be at least a year, if not more, away from launching Jaguar Shores – its next GPU architecture codenamed. That’s assuming it doesn’t suffer a similar fate to Rialto Bridge and Ponte Vecchio.
This is not the first time Intel has cut short development of a GPU that could compete with Nvidia, let alone AMD. Intel cut Rialto Bridge in early 2014, the successor of its datacenter-class GPU Max chips that were slated to power America’s Aurora supercomputer. At least, those earlier Max chips were deployed in limited quantities by the likes Argonne National Lab in the US, UK’s Dawn supercomputer, and Germany’s SuperMUCNG Phase 2 system.
By limited, we mean that Intel pulled the plug in mid-2024 on GPU Max. This was presumably done to focus on the Gaudi family accelerators – more on these later — and prepare the Falcon Shores debut.
In this context, it was inevitable that Falcon Shores would die. Intel’s roadmap originally had a release date of 2024, but this was pushed forward by a year when Rialto Bridge was abandoned. Falcon Shores included a XPU variant which combined CPU and GPU on a single package. These plans were scaled back in mid-2023to a more conventional GPU approach. Falcon Shores has now been essentially killed.
What about Gaudi? Intel hasn’t given up on AI yet, despite its one-for-three record with high-end GPUs. The x86 player has its Gaudi3 accelerations.
The accelerators looked good on paper when they were first unveiled back in April. The AI accelerator was capable of 1,835 teraFLOPS dense floating-point performance with either 8- or-16-bit precision. Gaudi3’s performance for compute-bound workloads, which are typically run at BF16 (Bit Floating Point 16), was nearly twice as good as Nvidia H100 or H200.
Gaudi3 is able to compete with larger models, such as Nvidia’s h100, while providing theoretically higher throughput. This is because it has 128GB HBM2e Memory, which provides 3.7 TBps bandwidth.
Unfortunately, Gaudi3 no longer competes with H100s. The part was first introduced in early 2024 but only started trickling out to manufacturers late last year. General availability is scheduled for this quarter.
This means that potential buyers will now be comparing the part with Nvidia’s Blackwell systems as well as AMD’s MI325X systems . Blackwell is better for training because it offers more floating-point precision, faster memory, and a larger scale-up area. AMD’s MI325X has twice the memory bandwidth and twice the capacity of the AMD MI325X, giving it an edge when it comes to determining where bandwidth and capacity are most important.
It could explain why Intel failed to meet its target despite the fact that Pat Gelsinger, then-CEO of Intel, had insisted that Gaudi3 was expected to generate over $500 million in revenue from accelerators in the second half 2024. This is despite the fact that Nvidia’s price point is extremely competitive.
This could be due to a variety of factors, from system performance and maturity of competing software eco-systems. Intel’s biggest problem is that Gaudi3 has reached its end.
According to our understanding, its successor was supposed be a variant on Falcon Shores which was supposed to mesh Intel’s Xe graphic architecture and its enormous systolic Arrays .
We may see Gaudi3 gain some ground in 2025. However, given the complete lack and uncertainty surrounding Jaguar Shores, many will not take the risk, especially when there are other platforms available from chip designers who have proven roadmaps and track record. Intel’s shrinking position in the AI Datacenter
The host CPU is still needed by datacenter operators, regardless of which GPUs or AI Accelerators they choose. Holthaus said to Wall Street this week.
There is still a large market for CPU-based inference both on-prem as well as at the edge.
There is still a large market for CPU-based inference both on-prem as well as at the edge.
Intel’s Granite Rapids Xeonslaunched last year are its most compelling for years. They boast core counts of up to 128 and 256 threads. They also support MRDIMMS with speeds up to 8,800 MT/s.
This segment is becoming more competitive. It’s difficult to ignore the gains AMD is continuing to make in datacenters with its Epyc family of processors. Mercury Research reports that the Ryzen slinger commands approximately 24.2 percent of server CPU market.
Meanwhile Nvidia is increasingly relying upon its Arm-based Grace processors to power its top-specced accelerators. Nvidia is a longtime Intel partner, having used its CPUs on several generations of DGX references designs. Intel can still gain share in this market, as Nv still supports HGX with eight GPUs. Intel’s chances to capitalize on AI may be shrinking at the datacenter but Chipzilla still has an opportunity at the network edge as well as on the PC.
Intel, like most PC hardware makers, has been banging the AI PC drum even before Microsoft revealed its 40 TOPS performance requirements.
While this led to an awkward moment where Qualcomm was the sole supplier of Copilot+ compatible CPUs for a few short months, AMD and Intel were both able to catch-up with the launch Strix Point and Lunar Lake, respectively, in July and September.
As we saw at Computex, Lunar Lake has a 48 TOPS GPU and CPU along with a 48 TOPS NPU. Intel claims that the system-on chips can deliver 120 system TOPS total between the three. Intel still controls a large portion of the PC CPU market. Intel is in the race, even though it’s not clear how important AI features will be to PC customers. AMD, Qualcomm and Nvidia are all fierce competitors at the high-end PC spectrum. Intel’s CPU strategy, along with the emerging AI PC industry, could help it secure victories at the network edge. It can use the Advanced Matrix Extensions compute blocks that are baked into its CPUs dating back to Sapphire Rapids in order to run machine learning and generative AI workloads without a GPU. Intel demonstrated that its Granite Rapids Xeons could run LLMs with 70 billion-parameters at a reasonable 12 tokens per second, thanks to the MRDIMM support.
If we extrapolate this performance, we would expect to see generation speeds of around 100 tokens per second for an 8 billion-parameter model at least for a single batch size. Batch size is one of the factors that limits the economics of CPU only AI .
However, for a network appliance that only needs to run models occasionally, this would not be a problem. It could also help to eliminate the complexity and failure points compared to GPU solutions.
- Intel sinks nearly $19 billion into the red. Falcon Shores GPUs are killed, and Clearwater Forest Xeons are delayed.
- Want Intel on your Surface? Microsoft says it will cost $400 more.
- Microsoft was begging you to be reasonable. DeepSeek is a sign that companies should be more cautious about AI investment
- Don’t rule out an Intel comeback just yet
The rebirth AMD experienced in the post Bulldozer era has taught us not to count on Intel’s comeback.
Ryzen and Epyc were not the fastest, but they offered customers something that they couldn’t find from Intel: a lot of cheap, good-enough, cores.
AMD’s GPU strategy was similar, focusing first on delivering superior performance in high-performance applications (HPC). This helped AMD win several high-profile wins for its Instinct Accelerators with America’s Frontier, and more recently El Capitan Supercomputers.
With the MI300-series of accelerators and its pivot to AI, AMD differentiated itself again by targeting higher memory capacities than Nvidia. This helped AMD secure wins from hyperscalers, cloud providers and Microsoft, who were all trying to reduce the costs of memory-bound workloads, including inference.
This is because the decision to scrap Falcon Shores gives Intel the opportunity to start over and build something that is not hampered by architectural decisions that are no longer representative of the market.
Refocusing Jaguar Shores towards a rack-scaled design is a promising indication of what’s coming. Intel has a good chance of regaining a foothold in datacenters if it can differentiate its next GPU by offering something that customers want, but cannot get from its competitors. (r)