Latest Alibaba AI model demos AI improvements

kittikorn Ph. – stock.adobe.com

The latest model of Chinese public cloud provider Alibaba demonstrates how reinforced learning is driving AI effectiveness

By

Published on: 7 Mar 2025 15 :42

Two months after the DeepSeek R1 AI model shook the tech world, Alibaba Cloud introduced QwQ 32B, an open-source large language model.

Alibaba Cloud describes the new model, which is a “compact reasoning model”as a model that uses only 32 billion variables. However, it can deliver performance comparable to large language AI models using larger numbers of parameters. Alibaba Cloud published benchmarks on its website that suggest the new model is comparable with AI models from DeepSeek or OpenAI. These benchmarks include AIME 24 (mathematical reasoning), Live CodeBench (coding proficiency), LiveBench (test set contamination and objective evaluation), IFEval (instruction-following ability), and BFCL (tool and function-calling capabilities).

By using continuous Alibaba claimed that the QwQ 32B model shows significant improvements in mathematical reasoning, coding proficiency and reinforced learning (RL).

In a blog, the company stated that QwQ-32B uses 32 billion parameters and achieves performance comparable with DeepSeek R1, which uses 671 trillion parameters. Alibaba said this shows the effectiveness RL when applied on robust foundation models pretrained with extensive world knowledge. Alibaba stated in a blog post that “we have integrated agent-related abilities into the reasoning model. This allows it to think critically, while using tools and adapting the reasoning based on feedback from the environment.”

Alibaba stated that QwQ-32B shows the effectiveness of using RL to enhance reasoning abilities. This approach to AI training allows a reinforcement learning AI to be able to perceive, interpret, and act in its environment. It can also learn by trial and error. Reinforcement learning can be used to train machine-learning systems. Alibaba used RL in order to improve its model. Alibaba said: “We have seen the enormous potential of scaled RL and also realised the untapped opportunities within pretrained language model,” Alibaba. “As we develop the next generation Qwen, as we are confident in combining stronger foundation models and RL powered by scaled computing resources will propel us towards achieving Artificial General Intelligence [AGI].”

Using rewards from a general-reward model and rule-based verifyors, the QwQ-32B was trained to enhance its general capabilities. According to Alibaba, these include better instruction-following, alignment with human preferences and improved agent performance.

China’s DeepSeek which has been available since the beginning of the year, demonstrates RL’s effectiveness in its ability to deliver benchmark results comparable to rival US large-language models. Its R1 LLM is able to rival US artificial intelligence, without the need for the latest GPU hardware.

It is not a coincidence that Alibaba’s QwQ32B model uses RL. The US has banned exports of high-end AI acceleration chips, such as the Nvidia H100 graphic processor, to China. This means that Chinese AI developers had to find alternative ways to make their models work. Using RL appears to produce comparable benchmark results compared to what models such as those from OpenAI can achieve.

The QwQ-32B uses fewer parameters than DeepSeek to achieve similar results, which means it should be able run on less powerful AI accelerator hardware.

DeepSeek explained. Everything you need to understand

by Sean Kerner.

by Esther Shittu.

www.aiobserver.co

More from this stream

Recomended