Mistral has just updated its open-source Small model from version 3.1 to version 3.2. Here’s why.

Join the event trusted for over two decades by business leaders. VB Transform brings the people who are building enterprise AI strategies together. Learn more


French AI darling Mistral keeps the new releases coming during this summer.

Only days after announcing Mistral Compute’s own AI-optimized cloud services, the well-funded firm has released a new release. The 24B parameter model Mistral Smallhas been updated from a 3.1 version to 3.2-24B instruct-2506.

This new version builds directly off Mistral Small 3.1, aiming to improve specific behavior such as instruction follow-through, output stability, or function calling robustness. The update brings targeted refinements to both internal benchmarks and public benchmarks. While the overall architecture remains unchanged, it introduces refinements that will affect both internal evaluations as well as public benchmarks.

Mistral AI says that Small 3.2 adheres better to precise instructions, and reduces the likelihood for infinite or repetitive generation — a problem sometimes seen in previous versions when dealing with long or ambiguous requests.

The function calling template was also upgraded to support more reliable scenarios for tool-use, especially in frameworks such as vLLM.

At the same time, the setup could be run on a single Nvidia A100/H100 GPU of 80GB, opening up options for businesses that have limited computing resources or budgets.

A new model was released in March 2025, only 3 months after the original

Mistral Small 3.1. It was a flagship release for the 24B parameter range. It offered multimodal capabilities, multilingual comprehension, and long context processing of up 128K tokens.

This model was positioned explicitly against proprietary peers such as GPT-4o Mini and Claude 3.5 Haiku — and according to Mistral outperformed them in many tasks.

Small 3.1 emphasized efficiency in deployment with claims of running at 150 tokens per seconds and support for 32 GB RAM. This release included both base and instruct checks, allowing for flexibility in fine-tuning the system across domains like legal, medical and technical fields.

Small 3.2, on the other hand, focuses on surgically improving behavior and reliability. It does not intend to introduce new architecture or capabilities. It is a maintenance release that fixes edge cases, tightens up instruction compliance and refines system prompt interactions.

What changed between Small 3.2 and Small 3.1?

Instruction-following benchmarks show a small but measurable improvement. Mistral’s accuracy increased from 82.75% for Small 3.1 to 84.78 % for Small 3.2.

Similarly, performance on external datasets like Wildbench v2 and Arena Hard v2 improved significantly—Wildbench increased by nearly 10 percentage points, while Arena Hard more than doubled, jumping from 19.56% to 43.10%.

Internal metrics also suggest reduced output repetition. The rate of infinite generations dropped from 2.11% in Small 3.1 to 1.29% in Small 3.2 — almost a 2× reduction. This makes the model more reliable for developers building applications that require consistent, bounded responses.

Performance across text and coding benchmarks presents a more nuanced picture. Small 3.2 showed gains on HumanEval Plus (88.99% to 92.90%), MBPP Pass@5 (74.63% to 78.33%), and SimpleQA. It also modestly improved MMLU Pro and MATH results.

Vision benchmarks remain mostly consistent, with slight fluctuations. ChartQA and DocVQA saw marginal gains, while AI2D and Mathvista dropped by less than two percentage points. Average vision performance decreased slightly from 81.39% in Small 3.1 to 81.00% in Small 3.2.

This aligns with Mistral’s stated intent: Small 3.2 is not a model overhaul, but a refinement. As such, most benchmarks are within expected variance, and some regressions appear to be trade-offs for targeted improvements elsewhere.

However, as AI power user and influencer @chatgpt21 commented on X: “It got worst on MMLU,” which is the Massive Multitask Language Understanding Benchmark, a multidisciplinary assessment with 57 questions to assess LLM performance in general across domains. Small 3.2 scored 80.50%, which is slightly lower than Small 3.1’s score of 80.62%.

Open source licenses will make it more attractive to cost-conscious users and those who are focused on customization

Small 3.1 and 3.2 can be accessed under the Apache 2.0 License and via the popular. AI code sharing repository Hugging Face is a startup from France and NYC.

Frameworks like vLLM, Transformers, and Small 3.2 support the software. It requires approximately 55 GB of GPU memory to run at bf16 and fp16 precision. The model repository contains examples of system prompts and inferences for developers who are looking to build or service applications.

Mistral Small is already integrated with platforms like Google Cloud Vertex AI, and is scheduled to be deployed on NVIDIA NIM, Microsoft Azure and NVIDIA NIM. Small 3.2 appears to only have self-serve and direct deployment options.

What enterprises should consider when evaluating Mistral Small for their use cases.

Mistral Small may not change the competitive positioning of open-weight models, but it represents Mistral AI’s commitment to iterative refinement.

Small 3.2’s noticeable improvements in reliability, task handling and precision — especially around instruction precision and tool use — offer a cleaner user interface for developers and enterprises that build on the Mistral ecosystem.

The fact it’s made by a French company and compliant with EU regulations and rules such as GDPR, and the EU AI Act makes it appealing to enterprises in that part.

For those who are looking for the biggest jumps in performance benchmarks, Small 3.1 is still a good reference, especially since Small 3.2 in some cases does not outperform Small 3.1. This makes the update a more stable option than an upgrade, depending on your use case.

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop on what companies do with generative AI. From regulatory shifts to practical implementations, we give you the insights you need to maximize ROI.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

www.aiobserver.co

More from this stream

Recomended