Alibaba launches open-source Qwen3 model that surpasses OpenAI R1 and DeepSeek o1

April 29, 2025

April 28, 2025 at 4:56 PM (19659002)

Credit : VentureBeat made using Qwen Chat

Learn More

Subscribe to our daily and weekly emails for the latest updates on industry-leading AI coverage. Learn More

Chinese web and e-commerce giant Alibaba’s Qwen Team has launched an open source AI large-language multimodal model series known as Qwen3 which appears to be amongst the state-of the-art for open models and approaches performance of proprietary models such as OpenAI and Google.

Qwen3 features two “mixture of experts” models and six dense (!) models, for a total eight (!) new models. The “mixture of experts” approach involves combining several different specialty models into one. Only those models relevant to the task being performed are activated in the internal settings (known as parameters) of the model. The open source French AI startup Mistral popularized the approach. According to the team the 235-billion-parameter version of Qwen3 codenamed “A22B” outperforms DeepSeek open source R1 as well as OpenAI’s proprietary proprietary o1 in key third-party benchmarks such ArenaHard (with over 500 questions in software engineering and mathematics) and is near the performance of the newly proprietary Google Gemini 2.5 Pro.

Overall, the benchmark data positions Qwen3-235B-A22B as one of the most powerful publicly available models, achieving parity or superiority relative to major industry offerings.

Hybrid (reasoning theory)

Qwen3 models have been trained to provide “hybrid” reasoning or “dynamic” reasoning capabilities. This allows users to toggle between quick, accurate responses, and more time-consuming, and compute-intensive, reasoning steps (similar OpenAI’s “o series”) for more difficult questions in science, mathematics, engineering, and other specialized areas. This is a method pioneered by Nous Research, other AI startups, and research collectives.

Qwen3 allows users to engage in the more intensive “Thinking Mode”using the button that is marked as such on Qwen Chat’s website, or by embedding prompts like /think and /no_think while deploying the model through the API or locally. This allows for flexible use depending upon the task complexity.

Users are now able to access and deploy the models across platforms such as Hugging Face, ModelScope and GitHub. They can also interact with them directly through the API. Qwen Chat mobile applications and web interface . The release includes both Mixture of Experts models (MoE) as well as dense models. All are available under the Apache 2.0 license.

During my limited use of the Qwen Chat site, it generated images with a decent level of promptness and accuracy — especially when text was added to the image in a native style. It often asked me to log in, and I was subject to the usual Chinese restrictions on content (such as not allowing prompts or responses relating to the Tiananmen protests).

In addition to the MoE offerings, Qwen3 includes dense models at different scales: Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B.

These models vary in size and architecture, offering users options to fit diverse needs and computational budgets.

The Qwen3 models also significantly expand multilingual support, now covering 119 languages and dialects across major language families. This broadens the models’ potential applications globally, facilitating research and deployment in a wide range of linguistic contexts.

Model training and architecture

Qwen3 is a significant improvement over its predecessor Qwen2.5 in terms of model-training. The pretraining dataset has doubled to 36 trillion tokens.

Data sources include web crawls and PDF-like document extractions, as well as synthetic content generated by previous Qwen models that focused on math and code.

A three-stage pretraining was followed by a refinement process of four stages to enable hybrid thinking and nonthinking capabilities. The training improvements enable the dense base Qwen3 models to match or surpass the performance of larger Qwen2.5 model.

The deployment options are flexible. Users can integrate Qwen3 using frameworks like SGLang and vLLM which both offer OpenAI compatible endpoints.

For use locally, Ollama is recommended, as are LMStudio and MLX. Users interested in the agentic capabilities of the models are encouraged to explore Qwen-Agent, which simplifies the tool-calling operation.

Junyang Lin is a member of Qwen’s team. On X it was noted that building Qwen3 required addressing less glamorous but equally important technical challenges, such as scaling reinforcement-learning stably, balancing data from multiple domains, and expanding performance in multilingual without sacrificing quality. Lin also stated that the team’s focus is shifting to training agents capable of reasoning over a long-horizon for real-world tasks.

What it means to enterprise decision-makers.

Engineers can point existing OpenAI compatible endpoints at the new model within hours, instead of weeks. The MoE checkpoints (235 B parameters with 22 active B, and 30 B with three active B) provide GPT-4-class reasoning for roughly the GPU memory costs of a dense 20-30 B model.

Official LoRA or QLoRA hooks enable private fine-tuning, without sending proprietary data out to a third party vendor.

Dense versions from 0.6 B up to 32 B allow for easy prototyping on laptops, and scaling to multi-GPU Clusters without rewriting the prompts.

By running the weights locally, all prompts and out-puts can be logged. MoE sparsity reduces active parameters per call and thus the attack surface for inference.

Apache-2.0 licenses remove usage-based legal barriers, but organizations should still review the export-control and governance consequences of using a model that was trained by a China vendor.

It also offers a viable option to other Chinese players such as DeepSeek Tencent and ByteDance, as well as a growing number of North American model providers, including the aforementioned OpenAI as well as Google, Microsoft, Anthropic Amazon, Meta, and others. Apache 2.0’s permissive license, which allows for unlimited commercial use, is a major advantage over other open-source players like Meta whose licenses tend to be more restrictive.

This shows that the race to provide ever-more-powerful and accessible models is still fierce. Smart organizations looking to reduce costs should remain flexible and open to evaluating new models for their AI agent and workflows.

Looking ahead

Qwen’s team positions Qwen3 as not just an incremental improvement, but as a step towards future goals in Artificial General Intelligence and Artificial Superintelligence (ASI), AI that is significantly smarter than human beings.

The next phase of Qwen will include a larger data and model size, extended context lengths, broader modality support, as well as reinforcement learning with feedback mechanisms from the environment.

With the landscape of large scale AI research continuing to evolve, Qwen3’s open-weight release, under an accessible licence, marks another important milestone. It lowers barriers for researchers and developers, as well as organizations that want to innovate using state-of-the art LLMs.

Daily insights into business use cases from VB Daily

Want to impress your boss? VB Daily can help. We provide you with the inside scoop on what companies do with generative AI. From regulatory shifts to practical implementations, we give you the insights you need to maximize ROI.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

Hybrid (reasoning theory)

Model training and architecture

What it means to enterprise decision-makers.

Looking ahead

RELATED ARTICLES

New paper pushes against Apple’s LLM “reasoning collapse” study

Meta invests $15 billion in Scale AI to boost its disappointing...

Anne Wojcicki, founder of 23andMe, will take back control of the...