Ai2’s new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

Introducing Olmo 3.1: The Next Evolution in AI Models from the Allen Institute for AI

The Allen Institute for AI (Ai2) has unveiled Olmo 3.1, an enhanced iteration of its advanced AI models, building upon the foundation laid by Olmo 3. This latest version emphasizes improved efficiency, greater transparency, and enhanced control tailored for enterprise applications.

Refined Models for Diverse Applications

Ai2 has upgraded two key variants from the Olmo 3 lineup: Olmo 3.1 Think 32B, which is fine-tuned for sophisticated research tasks, and Olmo 3.1 Instruct 32B, designed to excel in instruction-following, multi-turn conversations, and tool integration. Additionally, the Olmo 3-Base model remains available, optimized for programming, comprehension, and mathematical problem-solving, and serves as a robust base for further fine-tuning.

Extended Reinforcement Learning for Superior Results

To develop Olmo 3.1 Think 32B, Ai2 researchers extended the reinforcement learning (RL) training period significantly. The model underwent an additional 21 days of training on 224 GPUs, incorporating extra epochs over the Dolci-Think-RL dataset. This prolonged training regimen led to notable improvements across multiple benchmarks, including a 5+ point increase on the AIME math competition, over 4 points on ZebraLogic and IFEval reasoning tests, and a remarkable 20+ point boost on IFBench. The model also demonstrated enhanced capabilities in coding and complex multi-step problem-solving.

Scaling Instruction-Tuned Models for Real-World Use

For Olmo 3.1 Instruct 32B, Ai2 applied the successful training approach used for the smaller 7B Instruct model to the larger 32B scale. This resulted in a model optimized for conversational AI, tool utilization, and sustained multi-turn dialogues, making it a powerful tool for practical applications in customer service, virtual assistants, and interactive AI systems.

Currently, these updated models are accessible via the Ai2 Playground and Hugging Face platforms, with API integration expected to be available shortly.

Benchmark Excellence and Competitive Edge

Olmo 3.1 models have demonstrated superior performance in various standardized evaluations, consistently outperforming their Olmo 3 predecessors. Notably, Olmo 3.1 Think 32B surpassed the Qwen 3 32B model on the AIME 2025 benchmark and delivered results comparable to the Gemma 27B model.

Meanwhile, Olmo 3.1 Instruct 32B outshone several open-source competitors, including Gemma 3, particularly in mathematics benchmarks. Ai2 highlights that this model represents their most advanced fully open instruction-tuned chat model at the 32B parameter scale, excelling in both dialogue and tool-based tasks.

In parallel, Ai2 has also enhanced its RL-Zero 7B models, which focus on math and coding, benefiting from extended and more stable training sessions, further boosting their reliability and accuracy.

Transparency and Open-Source Commitment

Ai2 continues to prioritize openness and user empowerment in AI development. The Olmo 3 series was designed to provide enterprises and research institutions with greater insight into the data sources and training methodologies behind the models. This transparency enables organizations to customize and retrain the models by incorporating their own datasets, fostering adaptability and continuous improvement.

Supporting this ethos, Ai2 offers tools that trace how large language model outputs correlate with their training data, enhancing accountability and interpretability.

As Ai2 states, “Olmo 3.1 Think 32B and Olmo 3.1 Instruct 32B exemplify how openness and high performance can coexist. By refining the same model architecture, we advance capabilities while maintaining full transparency over data, code, and training decisions.”

Looking Ahead

With Olmo 3.1, Ai2 sets a new standard for open, powerful AI models that balance cutting-edge performance with enterprise-grade transparency and control. As AI adoption grows across industries, models like Olmo 3.1 are poised to play a pivotal role in driving innovation, enabling more intelligent automation, and fostering trust through openness.

More from this stream

Recomended