DeepSeek releases new R1-0528 model on hugging face, rivaling top AI in coding
In early May, Chinese AI startup DeepSeek quietly released the latest iteration (DeepSeek-R1-0528) of its R1 large-language model on the Hugging Face platform. DeepSeek had announced to its user community in the previous days that a “small upgrade” of their R1 model, which was ready for testing on the official website, mobile app, and mini program, was now available. Although it is a minor upgrade, the initial tests conducted by developers have shown significant improvements to the model, including its capabilities in coding, reasoning, and interactive performance. Early adopters hail the new model, one of whom called it “a huge win for open source”.
Near OpenAI Level Coding Performance.
According to the Live CodeBench leaderboard, a competitive coding benchmark, as of May 29,2025, DeepSeek-R1-0528 is in fourth place. It has a Pass@1 Score of 73.1. This puts it close behind OpenAI’s O3-Mini and O4-Mini. This result indicates that DeepSeek’s new model is nearly equal to OpenAI’s advanced proprietary models when it comes to coding proficiency. DeepSeek-R1-0528 placed 4th in the Live CodeBench Challenge platform with a Pass@1 Score of 73.1 – just behind OpenAI’s O3 (High), 75.8 model and the O4-Mini, 80.2. This performance is impressive for an open-source code model and has led to developers celebrating the release as a community milestone. The widely-respected top coding model Anthropic’s Claude-4 did not appear on CodeBench’s leaderboard, likely due to API rate limitations, so a direct comparision isn’t yet available.
DeepSeek-R1-0528 uses the same Mixture-of-Experts architecture (MoE), but now in a much larger scale. Early reports claim that the model contains 670-685 billion total parameters, but only 37 billion of them are active at a given time due to the sparse MoE architecture. This allows the model’s efficiency to be maintained while allowing it to be scaled up. The latest version dramatically increases the context window. It can now support context lengths of up to 164K tokens in some tests. R1-0528 is able to ingest and reason about large documents or codebases. This is far beyond the limitations of earlier models, which were limited to a few thousand tokens.
Tests and reactions of developers
Early feedback from users suggests that DeepSeek-R1-0528 is on par or even better than Anthropic’s Claude 4, in many scenarios. Developers rushed to pit the new model against previous closed-source models and open models. Karminski, an AI blogger and KCORES project founder, asked both models to simulate a 3D simulation involving an orange ball crashing into a surface. DeepSeek’s output was more realistic, with orange-tinted diffuse reflectances on a wall as well as a polished control panel UI. Claude-4’s version was plainer. DeepSeek also wrote 728 lines of coding compared to Claude’s 542, indicating a detailed implementation. The developer who ran this experiment praised R1-0528 for its thoroughness and the improved visuals.
A second comparison shared on social networks involved asking different models to create an “airplane shooting” game. DeepSeek R1-0528, Claude 4 and the older DeepSeek V3-0324 were all tasked with generating game code. The new R1-0528 was able to produce the game and even added extra features, gameplay elements, and visuals, resulting in a richer and more complex output than Claude 4’s version. Claude’s bare-bones version was missing projectiles and power up items, but R1-0528 had them. This is a sign that the model has a greater understanding of coding and creativity. R1-0528 is able to generate over 1,000 lines of bug-free, clean code in a single go. It also handles front-end web design tasks with greater precision. In experiments, this model was able produce interactive web components with accurate functionality and style, something that older versions struggled to do.
Despite the impressive anecdotes from industry experts, they urge caution until more systematic assessments are completed. They point out that results can vary depending upon the prompt and use cases. In community discussions, a number of users humorously rated the R1-0528 coding ability at “Claude 3.7″implying that it is close to Claude-4, with significant improvements over Claude 3.5. Many users have also noticed a reduction in hallucinations, and a more coherent language output. DeepSeek-R1-0528 is generally regarded as a significant improvement in quality even though it may not be perfect in all situations.
Key enhancements and long-form capabilities
DeepSeek’s newest update offers a variety of other enhancements. Users have noted that R1-0528 has improved its writing and reasoning skills in this version. One enthusiast summarized R1-0528’s new strengths in the following way:
-
deep reasoning: Able perform step-bystep logical reasoning at a level comparable to Google’s AI model, rather than jumping to conclusions.
-
Improved Text Generation: Outputs look more natural and are formatted better, making essays and explanations more fluid.
-
A rigorous yet efficient style. The model adopts an unique reasoning style which is thorough and methodical, without sacrificing speed. It is transparent and reliable because it shows its work (a “chain-of-thought” approach).
-
Long-term focus (or extended focus): The expanded context window allows you to concentrate on one complex task for a long time (30-60 minutes). It can handle long queries or problems with multiple steps in one go.
Users who tested the model’s handling of long-context report promising results. In one experiment, R1-0528 received a large document to which it was asked detailed questions. The model showed much higher accuracy when retrieving and using data within a context of 32K tokens compared to its previous R1 version. At very large contexts (e.g. Its accuracy declined with 60K tokens, but its performance in 32K contexts saw a noticeable boost. This shows that the new model is able to provide reliable answers for most practical purposes, such as tens of thousands words of reference material. The prior model may have struggled. The written output of R1-0528 has also been noted by testers as being more grounded. An idiosyncrasy that existed in earlier versions where the model would insert bizarre “quantum physics” references into unrelated texts, appears to be fixed. Writing tasks now read much more naturally, with a style that is appropriate and less random, which will be a relief for users who use it to draft content.
Low Key Release, Future Plans, Context
This release demonstrates DeepSeek’s characteristically low profile approach to development. The new model, released under an open-source MIT licence with little fanfare and official documentation – there was no formal press release at first or a detailed model card – was made available to Hugging Face without any official documentation or fanfare. The company instead hinted at the new update in a community forum and let word-of-mouth spread among developers. DeepSeek has used this “stealth” launch strategy before. In March 2025, the company quietly uploaded an updated DeepSeek V3-0324 model on Hugging Face. This model introduced some of R1’s reinforcement learning techniques to the V3 series in order to improve its reasoning. In both cases, users and independent testers quickly dissected and shared the new models, while DeepSeek remained mostly quiet publicly. One overseas observer called this style “DeepSeek’s consistently low-key manner”.
It is interesting that DeepSeek chose not to call this powerful new model “R2” but rather an incremental R1 upgrade. This has led to speculation regarding the company’s versioning policy. Industry insiders believe that because the core architecture was not completely overhauled, the team chose not to declare the model a full R2 version – possibly reserving “R2” for a future release with more fundamental changes. Some believe the R1-0528 is what an R2 should have been, but was pushed out as R1 due to pressures from competitors and expectations management. “If this is the R1, how good will the R2 be?” quipped an amazed user, reflecting the excitement in the community. DeepSeek is yet to announce a timeline or provide any details about an R2 model, leaving AI enthusiasts in suspense as they wait for the company’s next move.
DeepSeek-R1-0528 is now freely available, allowing developers to experiment with one the most advanced large language models open-source has ever produced. Its ability (and in some instances, approach or exceed) the outputs from top-tier proprietary models such as OpenAI’s and Anthropic marks a significant milestone in the AI landscape. This release not only shows the rapid progress made by China’s AI startups within the open-source domain but also raises expectations for future models, including DeepSeek R2. DeepSeek’s low key but high impact strategy is worth watching as the race between open and closed AI heats up. R1-0528 is likely to spark more innovation and collaboration within the global AI community.