the KVCache.AI from Tsinghua University team, in partnership APPROACHING.AI, had announced a major upgrade to the KTransformers project last week. Users can now run the full-powered DeepSeek R1 and V3 version locally with a 24GB GPU (NVIDIA). Pre-processing speed can reach 286 tokens/second, while inference generation reaches 14 tokens/second.
What it means: Users access DeepSeek-R1 mostly through cloud services and local deployment. However, the official servers are often down, and personal deployments typically involve a distilled, 90% less parameterized version. Most users find it difficult to run the full version DeepSeek R1 on standard hardware. Renting servers is a burden for even developers. KTransformers, an open-source project, offers a solution that is affordable.
Details KTransformers is a solution that breaks the limitations of AI large models that rely on expensive cloud servers. This was reported by National Business Daily.
- According to a user who analyzed the costs of the solution, running the DeepSeek R1 locally was possible for less than RMB 70,000 ($9 650) – over 95% cheaper than NVIDIA’s A100/H100 server, which can cost as much as RMB 2,000,000 ($288,000).
- KTransformers optimizes deployment of large language model (LLMs), on local machines, to overcome resource limitations. The framework uses innovative techniques such as heterogeneous computation, advanced quantization and sparse-attention mechanisms to improve computational efficiency while processing long-contextual sequences. The report stated that KTransformers’ inference speed cannot compare to the cost of high end servers and can only serve one user at a given time. Servers can simultaneously service dozens of users.
- The overall solution relies on Intel’s AMX instructions set. CPUs from other brands cannot yet perform these operations. This solution was designed for DeepSeek MOE models. It may not perform optimally on other mainstream models.
- The Chinese media outlet IThome stated that the KTransformers setup requires the following: an Intel Xeon Gold 645LS CPU with 1TB DRAM and 2 NUMA nodes, an RTX 4900D GPU with 24GB of VRAM, 1TB standard DDR5-4800 Server Memory, and CUDA 12.1 or higher.
The context: DeepSeek-R1 was released on Jan. 20. This release created headlines all over the world, and many people believed that the AI industry has entered a new phase, where competition is global, open-source is thriving, and cost-efficiency is a major factor for the development and deployment AI systems.
- According to the published API (Application Programming Interface), DeepSeek R1 costs RMB 1 ($0.14) for every million input tokens that are cached, RMB 4 ($0.55) for each million input tokens that are not cached, and RMB 16 (US$2.21) for every million output tokens. This is about 1/30th the cost of OpenAI GPT-4.
Jessie Wu, a Shanghai-based tech reporter, is
a member of the
the
a
. She covers the gaming, semiconductor and consumer electronics industries for TechNode. Connect with her via e-mail: [email protected]. More by Jessie Wu