Nvidia’s AI Factory’ narrative faces a reality check as inference battles expose 70% margins.

June 26, 2025

June 25, 2025 11:56 AM

Join the event trusted for over two decades by business leaders. VB Transform brings the people who are building enterprise AI strategies together. Learn more

Tuesday at VB, the gloves were removed Transform 2025alternative chip makers directly challenged Nvidia’s dominance narrative in a panel on inference, exposing an essential contradiction: How can AI-inference be“a commoditized factory” and command a 70% gross margin?

Jonathan Ross is the CEO of Gdidn’t mince his words when he discussed Nvidia’s carefully crafted message. Ross said that AI factory was a marketing tool to make AI less scary. Sean Lie, CEO of Cerebras, a competitor of Nvidia, was also direct: “I do not think Nvidia mind having all the service providers fight it out for the last penny, while they are sitting there comfortably with 70 points.”

The future architecture of enterprise AI and hundreds of billions of infrastructure investment is at stake. The panel revealed uncomfortable truths for CISOs, AI leaders and others who are currently in weekly negotiations to increase capacity with OpenAI or other providers.

>>View all our Transform coverage here

The capacity crisis that no one is talking about

SemiAnalysis (19459058). There are weekly meetings that take place between the largest AI users and their model suppliers to try to convince them to allocate additional capacity. There are also weekly meetings between model providers and their respective hardware providers.

The panel participants also pointed out that the token shortage exposed a fundamental flaw with the factory analogy. The traditional manufacturing industry responds to signals of demand by increasing capacity. When enterprises need 10 times more capacity for inference, they find that the supply chain is not flexible. GPUs need two-year lead time. Data centers require permits and power agreements. Infrastructure wasn’t designed for exponential scaling. This forced providers to limit API access.

Patel says, Anthropichas jumped from $2 billion ARR to $3 billion in just six months. Cursorgrew from zero to 500 million ARR. Openai has crossed $10 billion. But enterprises still cannot get the tokens that they need.

Why ‘Factory Thinking’ breaks AI Economics

Jensen Huang “””https://resources.nvidia.com/en-us-nim/intelligence-manufacturer#:~:text=Huangdiscussedtheconceptof,valuabledatatokensdistributedglobally.” ” rel=””noreferrer noopener”” target=””_blank” “> AI factory ” concept implies standardization, commodityization and efficiency gains which drive down costs. The panel revealed three fundamental reasons why this metaphor fails:

Firstly, inferences are not uniform. Patel said that even today, when it comes to inference, there are a number providers who fall along a curve of how fast and at what price they provide. DeepSeek delivers its own model for the lowest price, but only 20 tokens a second. “Nobody wants a model that delivers 20 tokens per second.” I talk faster than twenty tokens a seconds.”

Secondly, quality varies widely. Ross drew an historical parallel with Standard Oil: “When Standard Oil began, oil was of varying quality. The AI inference market today faces similar quality variations. Providers use various techniques to reduce costs, which inadvertently compromise the output quality. Third, and perhaps most importantly, the economics is inverted. Ross said that one of the unusual things about AI is that it’s impossible to spend more money and get better results. “You can’t have a software app, say I’m going spend twice as much on hosting my software, and then applications can get better.” This wasn’t a simple recognition. It was a slap in the face to every other provider who cut corners.

Ross explained the mechanics. “Many people do a number of tricks to reduce quality, not deliberately, but to lower costs, improve their speed.” Although the techniques sound technical, the impact is clear. Quantization reduces precision. Pruning removes parameters. Each optimization can degrade model performance in ways that enterprises may not detect before production fails. Ross’s parallel to Standard Oil (19659019) illuminates the stakes. The same problem exists in today’s market for inference. The providers who bet that enterprises will not notice the difference between 95% accuracy and 100% accuracy are betting on companies like Meta, which have the sophistication to measure degradation. This is a major issue for enterprise buyers.

Establish quality benchmarks before selecting providers. Audit existing inference providers for hidden optimizations. Accept that premium pricing is now a market feature. When Zuckerberg pointed out the differences, the era of assuming functional equality across inference providers came to an end.

The $1 million token paradox (19659025) The most revealing moment was when the panel discussed price. Lie brought to light an uncomfortable truth about the industry: “If we believe that these million tokens can be as valuable as they are, right?” It’s not just about moving words. You don’t charge a dollar for moving words. I pay my attorney $800 per hour for a two-page memorandum.”

The observation gets to the core of AI’s pricing problem. The industry is racing to reduce token costs below $1.50 for every million tokens, while claiming that these tokens will transform all aspects of business. The panel agreed implicitly that the math does not add up. Ross revealed that “pretty much everyone, including these fast-growing startup companies, is spending a lot on tokens, and the amount they spend almost matches their revenue by one to one.” This 1:1 ratio of AI tokens to revenue represents a business model that is unsustainable, according to panelists.

Performance is everything

Cerebras & Groq don’t just compete on price, they also compete on performance. They are fundamentally changing the possibilities in terms of inference speeds. “With the wafer-scale technology that we have built, we are enabling 10 to 50 times faster performance than the fastest GPUs of today,” Lie said.

It’s not an incremental improvement. It opens up entirely new use cases. “We have clients who have agentic work flows that could take 40 minutes and they want them to run in real-time,” Lie explained. “These things aren’t possible, even if they’re willing pay top dollar.”

This speed differential creates a market that is bifurcated and defies standardization. Enterprises that need real-time inference to support customer-facing applications cannot use the same infrastructure used by those who run overnight batch processes.

The real bottleneck is power and data centers.

While the focus has been on chip supply, this panel revealed that the real constraint to AI deployment is power and data center capacity. “Data center capacity is an important problem.” Patel said that it is difficult to find data center space within the United States. “Power is a major problem.”

Infrastructure challenges go beyond chip manufacturing and include fundamental resource constraints. Patel explained that “TSMC in Taiwan can make over $200,000,000 worth of chips.” It’s not just… it’s their speed at which they scale-up is ridiculous.

But without infrastructure, chip production is nothing. Patel said that power is the reason for these large Middle East deals and partly why both companies have a big presence in the Middle East. The global scramble to compute has enterprises “going around the world” to find power, data center capacity, and electricians who can build electrical systems.

Google’s “success catastrophe” becomes a reality for everyone

Ross shared anecdotes from Google’s past: “In 2015, there was a term called Success Disaster that became very popular. Some teams had developed AI applications that were able to perform better than humans for the first time. The demand for computing was so high, the teams needed to quickly double or triple their global data center footprint. Applications either fail or grow at a rapid rate that reaches infrastructure limits. There is no middle ground. The smooth scaling curve predicted by factory economics does not exist.

What this means for enterprise AI strategies

The panel’s revelations require strategic recalibration for CIOs, CISOs, and AI leaders:

Planning capacity requires new models. Traditional IT planning assumes linear growth. AI workloads challenge this assumption. Annual capacity plans are rendered obsolete in a matter of quarters when successful applications increase token usage by 30% per month. Enterprises must move from static procurement cycles to dynamic management of capacity. Build contracts with provisions for bursts. Monitor usage weekly, not quarterly. Accept that AI scaling curves resemble viral adoption curves and not traditional enterprise software deployments.

The speed premium is permanent. The idea inference will commoditize into uniform pricing ignores massive performance gaps between providers. Enterprises must budget for speed in areas that matter.

Architecture is more important than optimization. Groq or Cerebras don’t win by making GPUs better. They win by rethinking AI compute’s fundamental architecture. Businesses that bet their entire infrastructure on GPUs may find themselves in the slow lane.

The power infrastructure is strategic. Kilowatts and cooling are the main constraints, not chips or software. Smart enterprises have already locked in power capacity and space for data centers beyond 2026.

The infrastructure reality enterprises cannot ignore

Panelists revealed a fundamental fact: the AI factory metaphor was not only wrong, but dangerous. Businesses that build strategies around commodity inference prices and standardized delivery plan for a market which doesn’t exist.

Three brutal realities govern the real market.

The scarcity of capacity creates a power inversion, where suppliers dictate the terms and enterprises beg to receive allocations.
The quality variance, or the difference between 95% accuracy and 100% accuracy, is what determines if your AI applications will succeed or fail catastrophically.
Infrastructure constraints and not technology are the ones that set the limits for AI transformation.

For CISOs, AI leaders must abandon factory thinking. Now is the time to lock in power capacity. Audit inference providers to detect hidden quality degradation. Build vendor relationships on architectural advantages and not marginal cost savings. Accept that paying 70% for high-quality, reliable inference could be the best investment.

Transform’s alternative chip makers didn’t only challenge Nvidia narrative. They revealed that enterprises have a choice to make: pay for performance and quality, or attend the weekly negotiations. The panel was unanimous in its conclusion: to achieve success, it is important to match specific workloads with appropriate infrastructure instead of pursuing one-size fits all solutions.

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop a bout what companies are doing to maximize ROI, from regulatory changes to practical deployments.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

The capacity crisis that no one is talking about

Why ‘Factory Thinking’ breaks AI Economics

Performance is everything

The real bottleneck is power and data centers.

Google’s “success catastrophe” becomes a reality for everyone

What this means for enterprise AI strategies

The infrastructure reality enterprises cannot ignore

RELATED ARTICLES

The AI lab revolving door spins ever faster

This AI finds simple rules where humans see only chaos

This tiny chip could change the future of quantum computing