OpenAI and friends aren’t the only Chinese LLM makers to be concerned about. Right, Alibaba?

Analyse Silicon Valley has had to face the reality of DeepSeek’s claims that it can train large language models (LLMs), at a rate comparable with America’s top. The startup isn’t alone in the US when it comes to Chinese model builders. This week, Chinese cloud and ecommerce giant Alibaba unveiled an array of LLMs. One was a model that appears to be called Qwen 2.5 max. It is said to outperform DeepSeek’s R1’s reasoning-capable V3 as well as America’s best models.

We always recommend that you take benchmarks with a grain or salt, but if Alibaba’s claims are to be believed, Qwen 3.1 405B, which can search the internet and output text, images, and video from inputs, managed to outperform OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet across the popular Arena Hard, MMLU Pro, GPQA Diamond, LiveCodeBench and LiveBench benchmark suites. Alibaba’s Qwen 2.5 Max compared to the competition. Click to enlarge. This could also explain why OpenAI’s flagship models o1 and GPT-4o are compared.

The announcement, in any case, further fuels the impression that, despite the ongoing efforts by the West to stifle Chinese AI, the US lead may not be as great as previously thought. The perception that Silicon Valley’s countless billions of dollars to develop artificial intelligence is a little greedy.

Feeds and speeds or the lack thereof

Unfortunately Qwen’s team at Alibaba is keeping its latest model release under wraps, despite performance claims, APIs, and a chatbot on the web. Alibaba’s Qwen 2.5 Max is not available for download. Alibaba’s servers allow you to access it.

We know that Qwen 2.5 is a large-scale MoE model. It was trained using 20 trillion tokens, and then refined with supervised fine tuning and reinforcement learning based on human feedback.

MoE models, like the Mistral series or DeepSeek’s R1 and V3, are a mixture of artificial experts that have been trained for specific tasks such as coding and math.

MoE model builders are increasingly using them to decouple parameter counts from performance. It’s possible to increase parameter counts without compromising performance because only a part of the model is used for each request.

That is to say, rather than running an input query through the entire multi-billion-parameter network, performing all those calculations per token, only query-relevant layers are used, meaning outputs are generated faster. Alibaba has not disclosed the exact size of Qwen 2.5 max. We do know that the previous Qwen Max was approximately 100 billion parameters. The Register

has reached out to Alibaba

for comment. We’ll let you all know if and when we hear back. We asked Qwen2.5 Max to share its specs via its online chatbot formbut it didn’t seem to know anything about itself. Even if it did give us a number, it’s unlikely we’d be able to believe it.

Performance for what price

We may never be able to get the neural network weights of Qwen’s 2.5 Max. Alibaba Cloud lists the model as proprietary, which could explain why the Chinese super-corp shares so little information about the model.

Many model builders do not disclose parameter counts or other key details. Alibaba is no exception.

The lack details makes evaluating the performance of models difficult, as performance must be weighed against cost. It may be that one model outperforms another in benchmarks but if the cost to run it is 3-4x higher, it might not be worth it. This appears to be the situation with Qwen 2. Max.

Alibaba’s website currently offers API access to a model listed for $10 per million input tokens, and $30 for each million tokens produced. OpenAI charges only $2.50 for every million input tokens. It also charges $10 for every million output tokens. If you choose batch processing, the cost is half.

Despite this, Qwen2.5 Max is still cheaper than OpenAI o1’s flagship model. It costs $15 per million input tokens or $60 per million outputs tokens.

A growing Family

As previously mentioned, Alibaba’s Qwen model is just the latest in a series of LLMs released since 2023. In September, Alibaba began releasing weights of its latest generation of models. These models are known as Qwen 2.5. Alibaba has also released weights for the 0.5, 1,5, 3, 7, 14 and 32-billion-parameter version.

Alibaba claimed that the largest models of its Qwen 2.5 line could compete with and, in some cases, even surpass Meta’s 405B Llama. We recommend that you take these claims with a grain or two of salt.

In addition to its general-purpose LLMs, Alibaba released the weights of several math and code optimized LLMs, and extended access a pair proprietary models called Qwen Plus, and Qwen Turbo. These models boasted performance that was allegedly within spitting range of GPT-4o, and GPT-4o Mini.

It detailedin December its OpenAI o1-style “thinking” QwQ model. And then this week, leading up to the Qwen 2.5 Max launch, the cloud provider announced a trio of open vision language models (VLMs) weighing in at 3, 7, and 72-billion-parameters in size. Alibaba claims that the largest of these models is competitive with Google’s Gemini 2 and OpenAI’s GPT-4o as well as Anthropic’s Claude 3.5 Sonnet at least for vision benchmarks.

As if that wasn’t enough, Alibaba also released updated versions of its 7- and 14-billion parameter Qwen 2.5 models this week, which increase their context window – essentially their short-term memory – to a million symbols.

Longer windows are particularly useful for retrieval-augmented generation (RAG), which allows models to parse large quantities of information without getting lost.

  • China’s DeepSeek has just released a free competitor to OpenAI’s o1 — here’s how you can use it on your computer
  • When we cannot build bigger AI datacenters, what happens? US AI shares are still holding up after yesterday’s DeepSeek loss
  • DeepSeek hasn’t finished with OpenAI yet – Janus Pro is aiming for DALL-E 3.
  • DeepSeek doesn’t want to stop there – image maker Janus Pro is aiming for DALL-E 3.

Questions remain.

Despite the hype and volatility caused by Chinese model builders over the last week, concerns about

We pointed out that DeepSeek’s online services would store user data in China as per its privacy policy. Alibaba’s Qwen Chat may store data either in its Singapore or Chinese datacenters.

For some, this might be a major issue. But for others, it is a legitimate concern. Steve Heidel, OpenAI API developer, posted on X this week . “Americans sure love giving their data away to the CCP in exchange for free stuff.”

Concerns were also raised regarding censorship of controversial subjects that could paint the Beijing regime as unfavorable. As we’ve seen in previous Chinese models, DeepSeek and Alibaba leave out sensitive topics, stop generation too early, or refuse to answer questions about topics like the Tiananmen massacre or the political situation of Taiwan. (r)

www.aiobserver.co

More from this stream

Recomended