Singapore-based AI startup[19459035]has developed a new AI architecture that can perform 100 times faster than LLMs using just 1,000 training examples.

Sapient Intelligence has developed an AI architecture that is smaller and more efficient than large language models.

The architecture known as the The Hierarchical Model is based on the way in which the human brain uses distinct [systems]for slow, deliberate planning, and fast, intuitive calculations. The model is able to achieve impressive results using a fraction of data and memory needed by current LLMs. This efficiency could be important for real-world AI applications in enterprises where data is scarce and computational resource are limited.

Current LLMs are largely dependent on the chain-of-thought prompting when faced with complex problems. They break down problems into text-based intermediate steps, forcing the model to essentially “think out loud” while it works towards a solution.

Although CoT has improved LLMs’ reasoning abilities, it has some fundamental limitations. In their Researchers at Sapient Intelligence argue in a paperthat “CoT is a crutch and not a satisfactory answer.” It relies on brittle human-defined decompositions, where a single mistake or a misorder can completely derail the reasoning process.

AI Impact Series Returns To San Francisco – 5 August

Are you ready for the next phase of AI? Join leaders from Block GSK and SAP to get an exclusive look at the ways autonomous agents are reshaping workflows in enterprise – from end-to-end automated to real-time decision making.

Reserve your seat now as space is limited. https://bit.ly/3GuuPLF

This dependency on generating explicit language tethers the model’s reasoning to the token level, often requiring massive amounts of training data and producing long, slow responses. This approach also ignores the “latent reasoning” which occurs internally without being explicitly articulated through language.

The researchers note that “a more efficient approach is required to minimize these data needs.”

A hierarchical approach inspired from the brain

In order to move beyond CoT the researchers explored “latent reason” where, instead of generating “thinking symbols,” the model reasons using its internal, abstract representation. This is more in line with the way humans think. As stated in the paper, “the brain sustains long, coherent chains without constant translation to language in a latent area, with remarkable efficiency.”

But achieving this level deep, internal reasoning is challenging. The addition of more layers to a deep-learning model can lead to a “vanishing-gradient” problem. This is where the learning signals become weaker across layers and training becomes ineffective. Recurrent architectures, which loop over computations, can also suffer from “early convergence,” where the model settles too quickly on a solution without fully exploring the issue.

The Sapient team sought a better solution by turning to neuroscience. The researchers say that the human brain is a powerful blueprint for achieving computational depth, which is lacking in current artificial models. The researchers say that the brain organizes computations hierarchically, across cortical areas operating at different timescales, enabling multi-stage reasoning.

Inspired, they designed HRM as two coupled, recurrent module: a high level (H) module to perform slow, abstract planning and a low level (L) module to perform fast, detailed calculations. This structure allows for a process called “hierarchical converging” where the fast L-module solves a small portion of the problem by executing multiple steps. The H-module then takes this result and updates its overall strategy. It then gives the L module a new, refined problem to work on. This resets the L module, preventing it getting stuck (early convergence), and allows the entire system to perform long sequences of reasoning steps using a lean architecture that does not suffer from vanishing grades.

HRM (left) smoothly converges on the solution across computation cycles and avoids early convergence (center, RNNs) and vanishing gradients (right, classic deep neural networks) Source: arXiv

According to the paper, “This process allows the HRM to perform a sequence of distinct, stable, nested computations, where the H-module directs the overall problem-solving strategy and the L-module executes the intensive search or refinement required for each step.” This nested-loop design allows the model to reason deeply in its latent space without needing long CoT prompts or huge amounts of data.

A natural question is whether this “latent reasoning” comes at the cost of interpretability. Guan Wang, Founder and CEO of Sapient Intelligence, pushes back on this idea, explaining that the model’s internal processes can be decoded and visualized, similar to how CoT provides a window into a model’s thinking. He also points out that CoT itself can be misleading. “CoT does not genuinely reflect a model’s internal reasoning,” Wang told VentureBeat, referencing studies showing that models can sometimes yield correct answers with incorrect reasoning steps, and vice versa. “It remains essentially a black box.”

Example of how HRM reasons over a maze problem across different compute cycles Source: arXiv

HRM in action

In order to test their model, researchers pitted HRM up against benchmarks requiring extensive searching and backtracking such as the Abstraction and Reasoning Corpus, extremely difficult Sudoku puzzles, and complex maze solving tasks.

Results show that HRM can solve problems that are beyond the capabilities of even advanced LLMs. On benchmarks such as “Sudoku-Extreme”, “Maze-Hard”and “Sudoku-Extreme”state-of-the-art CoT model failed completely, scoring zero accuracy. HRM, on the other hand, achieved near-perfect accuracy even after only 1,000 examples were used to train it.

The 27M-parameter model scored 40.3% on the ARC AGI benchmark, a test for abstract reasoning and generalization. This is higher than other CoT-based models, such as the much larger o3 mini-high (34.5%) or Claude 3.7 Sonat (21.2%). This performance, which was achieved with a very small pre-training corpus, and with only a few data points, highlights the power of its architecture.

HRM outperforms large models on complex reasoning tasks Source: arXiv

While solving puzzles demonstrates the model’s power, the real-world implications lie in a different class of problems. According to Wang, developers should continue using LLMs for language-based or creative tasks, but for “complex or deterministic tasks,” an HRM-like architecture offers superior performance with fewer hallucinations. He points to “sequential problems requiring complex decision-making or long-term planning,” especially in latency-sensitive fields like embodied AI and robotics, or data-scarce domains like scientific exploration.

In these scenarios, HRM doesn’t just solve problems; it learns to solve them better. “In our Sudoku experiments at the master level… HRM needs progressively fewer steps as training advances—akin to a novice becoming an expert,” Wang explained.

For the enterprise, this is where the architecture’s efficiency translates directly to the bottom line. Instead of the serial, token-by-token generation of CoT, HRM’s parallel processing allows for what Wang estimates could be a “100x speedup in task completion time.” This means lower inference latency and the ability to run powerful reasoning on edge devices.

The cost savings are also substantial. “Specialized reasoning engines such as HRM offer a more promising alternative for specific complex reasoning tasks compared to large, costly, and latency-intensive API-based models,” Wang said. To put the efficiency into perspective, he noted that training the model for professional-level Sudoku takes roughly two GPU hours, and for the complex ARC-AGI benchmark, between 50 and 200 GPU hours—a fraction of the resources needed for massive foundation models. This opens a path to solving specialized business problems, from logistics optimization to complex system diagnostics, where both data and budget are finite.

Looking ahead, Sapient Intelligence is already working to evolve HRM from a specialized problem-solver into a more general-purpose reasoning module. “We are actively developing brain-inspired models built upon HRM,” Wang said, highlighting promising initial results in healthcare, climate forecasting, and robotics. He teased that these next-generation models will differ significantly from today’s text-based systems, notably through the inclusion of self-correcting capabilities.

The work suggests that for a class of problems that have stumped today’s AI giants, the path forward may not be bigger models, but smarter, more structured architectures inspired by the ultimate reasoning engine: the human brain.

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop about what companies are doing to generate AI, from regulatory changes to practical deployments so that you can share insights and maximize ROI.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

Singapore-based AI startup[19459035]has developed a new AI architecture that can perform 100 times faster than LLMs using just 1,000 training examples.

A hierarchical approach inspired from the brain

HRM in action

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google...

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers...

Google rolling out Gemini 3 Deep Think for AI Ultra

Recomended

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google Lens and Google Lens

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers Blink cameras and other items

Google rolling out Gemini 3 Deep Think for AI Ultra

OpenAI says ChatGPT can save the average worker an hour per day

OpenAI boasts enterprise win days after internal ‘code red’ on Google threat