New AI training method creates powerful software agents with just 78 examples

Recent research conducted by the Global AI Research Institute (GAIR) reveals that training large language models (LLMs) for intricate, autonomous functions does not necessitate vast amounts of data. Their innovative framework, LIMI, builds upon prior advancements in LLM studies and highlights a crucial insight: “machine autonomy arises not from sheer data volume but from the deliberate selection of high-quality, agentic demonstrations.”

Put simply, the emphasis should be on data quality rather than data quantity.

In their experiments, the team demonstrated that a small yet meticulously curated dataset of only 78 examples enabled LLMs to surpass models trained on thousands of samples by a significant margin across key industry benchmarks. This breakthrough holds substantial promise for enterprise environments where data is limited or costly to acquire.

Rethinking Autonomous AI: The Data Dilemma

The researchers define agency in AI as “the emergent ability of systems to operate autonomously-actively identifying problems, hypothesizing solutions, and executing tasks through self-guided interaction with environments and tools.” Essentially, these AI agents don’t just process information; they perform meaningful work.

Traditionally, prevailing training paradigms have assumed that enhancing agentic intelligence requires exponentially larger datasets, as suggested by established scaling laws in language modeling. This assumption often results in complex training pipelines and hefty computational expenses. Moreover, in many specialized domains, relevant data is scarce, difficult to obtain, and expensive to curate.

However, emerging evidence from adjacent fields challenges this notion. For instance, a 2023 study demonstrated effective model alignment using only 1,000 carefully selected examples. Similarly, another recent investigation revealed that advanced mathematical reasoning capabilities could be cultivated from just 817 training samples.

Inspired by these findings, LIMI applies the “less is more” philosophy to the realm of autonomous AI agents, aiming to achieve high performance with minimal data.

The LIMI Framework: Precision Over Volume

LIMI’s core innovation lies in its ability to foster sophisticated agentic intelligence through a compact yet strategically curated set of demonstrations showcasing autonomous behavior. Central to this approach is a structured pipeline for gathering exemplary agentic task demonstrations.

Each demonstration is composed of two elements: a query and a trajectory. The query represents a natural language request-such as a software development task or a scientific inquiry-posed by a user.

The trajectory details the step-by-step process the AI undertakes to fulfill the query. This includes the model’s internal reasoning, interactions with external tools (like code interpreters), and feedback from the environment. For example, a query might be “develop a basic task management app,” with the trajectory encompassing the agent’s planning, coding, execution, debugging, and iterative refinement until the goal is met.

These trajectories often involve multiple cycles of planning, action, and reflection, mirroring realistic human-AI collaboration.

To assemble their dataset, the researchers began with 60 real-world queries sourced from professional developers and researchers. They then expanded this collection by synthesizing additional queries derived from GitHub Pull Requests using advanced data augmentation techniques.

A team of four PhD-level computer scientists rigorously evaluated these queries, ultimately selecting 18 high-quality examples to form a refined set of 78 queries focused on software engineering and research workflows.

For trajectory generation, the same experts collaborated with a command-line interface (CLI) coding agent powered by GPT-5 to complete each task. This iterative process captured the entire interaction sequence, including back-and-forth exchanges and problem-solving adaptations. Some complex trajectories extended beyond 150,000 tokens, reflecting the depth of the problem-solving journey.

“This methodology ensures that models learn not only from successful outcomes but also from the comprehensive problem-solving process, including strategy adjustments and error recovery during collaborative execution,” the team explains.

Evaluating LIMI: Superior Performance with Minimal Data

The researchers validated LIMI by fine-tuning an advanced open-source model on their 78-sample dataset and benchmarking it against leading models such as GLM-4.5 and others across multiple agentic skill assessments, including AgencyBench-a benchmark specifically designed to measure autonomous capabilities.

The LIMI-trained model achieved an impressive average score of 73.5% on AgencyBench, outperforming all baseline models by a wide margin; the closest competitor, GLM-4.5, scored 45.1%. This dominance extended to other benchmarks evaluating tool usage, coding proficiency, and scientific computation.

Remarkably, the LIMI model’s performance exceeded that of models trained on datasets containing 10,000 samples, delivering superior results with 128 times less data.

“This paradigm shift suggests that cultivating true agentic intelligence hinges on grasping its fundamental nature rather than merely scaling up training data,” the researchers conclude. “As industries evolve from AI that thinks to AI that works, LIMI offers a sustainable blueprint for nurturing genuinely autonomous systems.”

Implications for Industry and Future AI Development

The team has made available the tools for data synthesis and training associated with LIMI, providing a practical resource for enterprises aiming to develop specialized AI agents.

Rather than investing in extensive data collection efforts, organizations can now harness their internal expertise and domain specialists to craft small, high-quality datasets tailored to specific agentic tasks. This approach significantly lowers barriers to entry and empowers businesses to build custom AI agents that deliver competitive advantages in critical workflows.

In an era where data acquisition can be a bottleneck, LIMI’s strategy of prioritizing quality over quantity offers a transformative path forward for autonomous AI development.

New AI training method creates powerful software agents with just 78 examples

Rethinking Autonomous AI: The Data Dilemma

The LIMI Framework: Precision Over Volume

Evaluating LIMI: Superior Performance with Minimal Data

Implications for Industry and Future AI Development

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google...

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers...

Google rolling out Gemini 3 Deep Think for AI Ultra

Recomended

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google Lens and Google Lens

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers Blink cameras and other items

Google rolling out Gemini 3 Deep Think for AI Ultra

OpenAI says ChatGPT can save the average worker an hour per day

OpenAI boasts enterprise win days after internal ‘code red’ on Google threat