DeepSeek-V3 runs at 20 tokens/second on Mac Studio. That's a nightmare situation for OpenAI.

March 24, 2025 at 12:50 PM (19659002)

Learn More

Subscribe to our daily and weekly emails for the latest updates on industry-leading AI content. Learn More

Chinese startup in AI DeepSeek quietly released a large language model, which is already causing ripples in the artificial intelligence industry. Not just because of its capabilities, but also for how it is being deployed. The 641-gigabyte, or “Dream” model, is already causing ripples in the artificial intelligence industry. DeepSeek V3-0324appeared on AI repository Hugging Face was released today with little to no announcement. This is in line with the company’s recent low-key, but impactful releases.

The model’s uniqueness is what makes this launch so special. My License – making it freely available for commercial usage — and early reports that Apple’s consumer-grade hardware can run it directly. Mac Studio M3 Ultra chip.

The new Deep Seek 0324 in 4-bit runs> 20 toks/sec. on a 512GB m3 Ultra with mlx – lm! pic.twitter.com/wFVrFCxGS6

– All these hands (@awnihannun)””https://twitter.com/awnihannun/status/1904177084609827054?ref_src=twsrc%5Etfw””> March 24, 2025

The new DeepSeek V3-0324 in 4 bit runs at>20 tokens/second with a 512GB m3 Ultra with mlx lm! wrote AI researcher Awni hanun is on social media. While the $9.499 Mac Studio may stretch the definition “consumer hardware”the ability to run a massive model on a local computer is a significant departure from the requirements of state-of-the art AI, which typically require data centers.

DeepSeek’s stealth launch strategy disrupts AI market expectations

The 685-billion-parameter model arrived with no accompanying whitepaper, blog post, or marketing push — just an empty README file (19459114) and the model weights themselves. This contrasts sharply to the carefully orchestrated product releases typical of Western AI firms, where months of hype are often released before actual releases.

Early users report significant improvements compared to the previous version. AI researcher Xeophon (19459114) declared in a post at X.com that he had tested the new DeepSeek V3 in his internal bench, and it showed a big jump in all metrics for all tests. It is now the best model without reasoning, dethroning Sonnet 3.5″

I tested the new DeepSeek V3 in my internal bench, and it had a huge leap in all metrics for all tests.
This is the best non-reasoning version, dethroning Sonnet 3.

Congratulations @Deepseek_ai ! pic.twitter.com/efEu2FQSBe

— Xeophon (@TheXeophon) March 24, 2025

If this claim is validated through broader testing, DeepSeek’s model would be positioned above the competition. Claude Sonnet 3.5 from Anthropic is one of the most highly regarded commercial AI systems. And unlike Sonnet which requires a monthly subscription, Weights for DeepSeek-V3-0324 (19459114) are available for download and use by anyone.

How DeepSeek V3-0324 architecture’s breakthrough achieves unmatched efficiency (19659016) DeepSeek V3-0324 is a search engine that uses a MoE architecture is a mixture-of expertsthat fundamentally changes the way large language models work. DeepSeek activates only 37 billion of the 685 billion parameters in specific tasks, unlike traditional models that activate all their parameters for every task. This selective activation represents an important paradigm shift in the efficiency of models. DeepSeek’s performance is comparable to that of much larger models with fully activated parameters, while requiring a fraction of the computational power.

This model incorporates two other breakthrough technologies: Multi-Head latent Attention and Multi-Token Prediction(MTP). MLA improves the model’s capability to maintain context over long passages of texts, while MTP generates more tokens in each step than the usual one at a time approach. Together, these innovations increase output speed by almost 80%.

Simon Willison (19459114), a developer tool creator, noted in an article that a 4-bit quantumized version reduces storage footprint to 352GB. This makes it possible to run the software on high-end consumer hardware such as the Mac Studio M3 Ultra chip .

The deployment of AI could be radically altered by this technology. Mac Studio uses less than 200 Watts for inference, compared to the traditional AI infrastructure that relies on Nvidia GPUs. This efficiency gap may require the AI industry to rethink its assumptions about infrastructure requirements in order to achieve top-tier model performances.

China’s open-source AI revolution challenges Silicon Valley’s closed-garden model

DeepSeek’s release strategy illustrates a fundamental difference in AI business philosophy between Chinese companies and Western ones. While U.S. leadership likes Openai Anthropic keeps their models behind paywalls. Chinese AI companies are increasingly adopting permissive open source licensing.

The Chinese AI ecosystem is being transformed by this approach. The open availability of cutting edge models creates a multiplyer effect that allows startups, researchers and developers to build on sophisticated AI technology without major capital expenditure. This has dramatically increased China’s AI capability at a rate that has surprised Western observers.

This strategy is based on the realities of the Chinese market. When there are many well-funded competitors offering similar capabilities, it becomes more difficult to maintain a proprietary strategy. Open-sourcing creates new value through ecosystem leadership, APIs, and enterprise solutions based on freely available foundation models.

Even the established Chinese tech giants are recognizing this shift. Baidu announced plans for its Ernie 4.5 Model SeriesOpen-Source by June Alibaba Tencenthas released open-source AI model with specialized capabilities. This is a stark contrast to Western leaders’ API-centric strategies.

Open-source also addresses the unique challenges faced by Chinese companies in AI. Due to restrictions on accessing cutting-edge Nvidia chip technology, Chinese firms have focused on efficiency and optimization in order to achieve competitive performance using limited computational resources. This innovation, which was driven by necessity, has now become an advantage.

DeepSeek V3-0324 : the foundation for an AI reasoning Revolution

– The timing and characteristics DeepSeek V3-0324 is a strong candidate for the foundation of DeepSeek. DeepSeek-R2 (19459114), an improved reasoning-focused version, is expected to be released within the next two weeks. This follows DeepSeek’s established pattern where base models are released several weeks before specialized reasoning models.

This is consistent with the release of V3 around Christmas, followed by R1 several weeks later. Reddit user noted that R2 was rumored to be released in April, so this could be the case. mxforest .

It is impossible to overstate the implications of an advanced, open-source reasoning system. Current reasoning models such as OpenAI’s o1 (19459114) and DeepSeek R1 represents the cutting edge of AI capabilities. It demonstrates unprecedented problem-solving ability in domains ranging from mathematics to coding. This technology would allow AI systems to be accessible to all, not just those with large budgets.

The R2 model could be released amid new revelations about the computational requirements of reasoning models. Nvidia CEO Jensen Huang noted recently that DeepSeek R1 model is ” It consumes 100 times as much compute than a nonreasoning AI contradicting previous industry assumptions about efficiency. DeepSeek models are able to deliver competitive performance despite operating with greater resource constraints than Western counterparts. If DeepSeek R2 follows the same trajectory as R1, it could pose a direct threat to GPT-5 is rumored to be OpenAI’s new flagship model. It will be released in the coming months. OpenAI’s heavily-funded, closed approach and DeepSeek’s open, resource efficient strategy represent two competing visions of AI’s future.

DeepSeek V3-0324: a complete guide for developers

and users. DeepSeek-V3-0324 (19459114) has several paths depending on the technical requirements and resources. The complete model weights can be found at Hugging Face is a 641GB download, which makes it only suitable for those with large storage and computing resources.

Cloud-based options are the most accessible for most users. OpenRouteroffers free API access with a chat interface that is easy to use. Select DeepSeek V3 0 324 as the model and begin experimenting.

DeepSeek’s own chat interface is at The company has not confirmed this, but chat.deepseek.com (19459114) has also likely been updated to the latest version. Early users report that the model can be accessed through this platform, with better performance than previous versions.

Developers who want to integrate the model into their applications can access it via various inference providers. Hyperbolic Labshas announced immediate availability of “the first inference providers serving this model on hugging face”while OpenRouter provides API access compatible with Hugging Face. Openai sdk

DeepSeek V3-0324 is now available on Hyperbolic.

Hyperbolic is committed to delivering open-source models the moment they are available. This is our commitment to the developer community.

Start interpreting today. pic.twitter.com/495xf6kofa

— Hyperbolic (@hyperbolic_labs) March 24, 2025 (19659039)

DeepSeek’s latest model emphasizes technical precision above conversational warmth.

Early reports have noted a noticeable change in the model’s style of communication. “While previous DeepSeek models have been praised for their conversational and human-like tone,” V3-0324 (19459114]” presents a more formal and technically-oriented persona.

Reddit user asked: “Is it just me or does this version seem less human?” nother_level . “For me, the thing that set deepseek v3 apart from other was that it felt like a human. It was not robotic in the way it sounded like other llms, but now that this version is out there its robotic af.” AppearanceHeavy6724,added: “Yeah it lost its aloof appeal for sure, and it feels too intelligent for its own good.” This personality shift is likely the result of deliberate design decisions by DeepSeek engineers. The shift to a more precise and analytical communication style suggests that the model is being repositioned for professional and technical applications, rather than casual conversations. This is in line with the broader industry trend, as AI developers are increasingly aware that different use cases require different interaction styles.

This more precise communication style could be an advantage for developers building specialized apps, as it provides clearer and more consistent results that can be integrated into professional workflows. It may limit its appeal for applications that are aimed at customers, where warmth and approachability is valued.

How DeepSeek’s open source strategy is redrawing global AI landscape

DeepSeek’s approach to AI distribution and development represents more than just a technical achievement – it embodies an entirely different vision of how advanced technology should be propagated through society. DeepSeek’s open-source strategy enables exponential innovation by making cutting edge AI freely available with permissive licensing. Closed models are inherently constrained. This philosophy is rapidly closing what many perceive as the AI gap between China, and the United States. Just a few months ago, analysts estimated that China was 1-2 years behind the U.S. AI capability. Today, the gap has shrunk dramatically to 3-6 months. Some areas are even approaching parity, or even Chinese leadership.

Android’s impact on mobile ecosystem is striking. Google’s decision of making Android free created a platform which ultimately achieved a dominant global market share. Open-source AI models could also outcompete closed systems due to their sheer ubiquity, and the collective innovation from thousands of contributors.

The implications go beyond market competition and include fundamental questions about access to technology. Western AI leaders are increasingly criticized for concentrating advanced technologies among corporations and individuals with high resources. DeepSeek’s method distributes these capabilities to a wider audience, potentially accelerating AI adoption.

As DeepSeek-V3-0324 (19459114) is now available in research labs and developer workstations around the world. The competition is not about who can build the most powerful AI anymore, but rather about who can enable the most people to create with AI. DeepSeek’s release is a quiet one, but it speaks volumes for the future of artificial intelligent. The company that shares the most freely its technology may have the greatest influence on how AI reshapes this world.

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop about what companies are doing to maximize ROI, from regulatory changes to practical deployments.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

DeepSeek-V3 runs at 20 tokens/second on Mac Studio. That’s a nightmare situation for OpenAI.

DeepSeek’s stealth launch strategy disrupts AI market expectations

China’s open-source AI revolution challenges Silicon Valley’s closed-garden model

DeepSeek V3-0324 : the foundation for an AI reasoning Revolution

DeepSeek V3-0324: a complete guide for developers

DeepSeek’s latest model emphasizes technical precision above conversational warmth.

How DeepSeek’s open source strategy is redrawing global AI landscape

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google...

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers...

Google rolling out Gemini 3 Deep Think for AI Ultra

Recomended

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google Lens and Google Lens

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers Blink cameras and other items

Google rolling out Gemini 3 Deep Think for AI Ultra

OpenAI says ChatGPT can save the average worker an hour per day

OpenAI boasts enterprise win days after internal ‘code red’ on Google threat