Home Technology Open-Source Tools The ‘cheap AI model’ is actually consuming your compute budget

The ‘cheap AI model’ is actually consuming your compute budget

0
The ‘cheap AI model’ is actually consuming your compute budget

Want smarter insights delivered to your inbox?

Subscribe to our weekly newsletter to receive only the most important information for enterprise AI, data, security, and data leaders. Subscribe Now


A comprehensive A new study reveals that open-source AI models consume significantly more computing power than their closed-source counterparts when performing identical tasks. This could undermine their cost advantage and change how enterprises evaluate AI deployment strategy.

This research was conducted by AI firm Researchfound that open-weight models used between 1.5 and 4 times more tokens – the basic units of AI calculation – than closed models such as those from Openai AnthropicFor simple knowledge questions the gap increased dramatically. Some open models used up to 10x more tokens.

Measuring thinking efficiency in reasoning models: The missing benchmark https://t.co/b1e1rJx6vZ

We measured token usage across reasoning models: open models output 1.5-4x more tokens than closed models on identical tasks, but with huge variance depending on task type (up to… pic.twitter.com/LY1083won8

— Nous Research (@NousResearch) August 14, 2025

Open weight models use 1.5-4x as many tokens as closed ones (upto 10x for simple questions of knowledge), making them more expensive per query, despite lower token costs,” researchers wrote in a report published on Wednesday.

These findings challenge the prevailing belief in the AI industry, that open-source models have clear economic advantages. Open-source models are typically cheaper per token, but the study shows that this advantage can easily be offset if more tokens are needed to solve a problem.


AI Scaling Has Its Limits.

Power limits, rising token costs, inference delays, and other factors are reshaping enterprise AI. Join our exclusive event to learn how top teams are: https://bit.ly/4mwGngO


The research examined the real cost of AI, and why ‘cheaper models’ may break your budget. 19 different AI models were tested across three categories: basic knowledge questions (QK), mathematical problems (MQ), and logic puzzles. The team measured the “token efficiency”which is how many computational units are used by models relative to the complexity and difficulty of their solutions. This metric has received little systematic research despite its important cost implications. The researchers stated that “Token Efficiency is a critical measure for many practical reasons.” “While hosting open-weight models may be cheaper than other models, this cost advantage can be easily offset if the models require more tokens to solve a problem.
Open-source AI models use up to 12 times more computational resources than the most efficient closed models for basic knowledge questions. (Credit: Nous Research)

The inefficiency is particularly pronounced for Large Reasoning Models (LRMs), which use extended “ Chains of thought” to solve complex problems. These models, which are designed to solve problems step-bystep, can use thousands of tokens to ponder simple questions that require minimal computation.

The study found that reasoning model spend “hundreds” of tokens on simple knowledge questions such as “What is Australia’s capital?” that can be answered with a single word.

The research revealed that model providers have vastly different models. OpenAI’s model, especially its Open-source and o4-mini ( ) are newly released. gpt -oss variants demonstrated exceptional token efficiency. This was especially true for mathematical problems. The study found OpenAI models to be “extremely efficient in math problems,” using as many as three times less tokens than commercial models.

Nvidia’s OpenAI is the best open-source option. llama-3.3-nemotron-super-49b-v1 emerged as “the most token efficient open weight model across all domains,” while newer models from companies like Mistral showed “exceptionally high token usage” as outliers.

Efficiency gaps varied by task type. Open models used roughly twice the number of tokens for math and logic problems but the difference was much greater for simple knowledge questions, where efficient reasoning is not necessary.

OpenAI’s latest models achieve the lowest costs for simple questions, while some open-source alternatives can cost significantly more despite lower per-token pricing. (Credit: Nous Research)

What enterprise leaders should know about AI computing cost

These findings have immediate implications for the adoption of AI in enterprise, where computing costs scale rapidly with usage. When evaluating AI models, companies often focus on accuracy benchmarks or token pricing but overlook the total computation requirements for real-world task. Researchers found that the higher API prices of closed weight models are often offset by the better token efficiency.

This study also revealed that closed source model providers are actively optimizing their models for efficiency. Closed weight models were iteratively optimized so that they used fewer tokens in order to reduce the cost of inference. Open-source models, on the other hand, increased their token usage when creating newer versions.

The computational overhead varies dramatically between AI providers, with some models using over 1,000 tokens for internal reasoning on simple tasks. (Credit: Nous Research)

Researchers cracked the code for AI efficiency measurement

Their research team faced unique challenges when measuring efficiency across model architectures. Many closed-source models do not reveal their reasoning processes. Instead, they provide compressed summaries to prevent competitors from stealing their techniques. Researchers used completion tokens (the total computation units billed for each question) as a proxy to measure reasoning effort. They found that “most closed-source models won’t share their raw reasoning traces,” and instead “use smaller languages models to transcribe” the chain of thinking into summaries or compact representations. American Invitational Mathematics Examination (19459073)

Different AI models show varying relationships between computation and output, with some providers compressing reasoning traces while others provide full details. (Credit: Nous Research)

The future of AI efficiency – What’s next?

Researchers suggest that token efficiency, along with accuracy, should be a primary target for future model development. “A densified CoT may also allow for more efficient usage of context and counter context degradation when challenging reasoning tasks,” They wrote

OpenAI’s Open-Source Released The gpt oss modelswhich demonstrate state of the art efficiency with “freely available CoT” could serve as a benchmark for optimizing other models.

Complete research dataset and evaluation codes are available. The findings are available on GitHuballowing other researchers the opportunity to validate and extend them. This study suggests that as the AI industry races to develop more powerful reasoning abilities, the real competition could not be who can build a smarter AI – but who can build a more efficient one.

In a world in which every token counts, the models that are the most wasteful may be priced out of the marketplace, regardless of their ability to think.

Daily insights into business use cases from VB Daily

Want to impress your boss? VB Daily can help. We provide you with the inside scoop about what companies are doing to maximize ROI, from regulatory changes to practical deployments.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.


www.aiobserver.co

Exit mobile version