Serving technology enthusiasts for more than 25 years. TechSpot is the place to go for tech advice and analysis you can trust.
What just happened? Microsoft introduced BitNet b1.58 – 2B4T a new large language model that is engineered to be extremely efficient. BitNet does not use 16-bit or 32-bit floating point numbers to represent weights, but instead uses three discrete values – -1, 0 or +1. This method, called ternary quantization allows each weight to fit into just 1.58 bits. The result is a model which dramatically reduces the memory usage, and can run much more easily on standard hardware without the high-end GPUs that are typically required for large-scale AI.
Microsoft’s General Artificial Intelligence Group developed the BitNet b1.58
model, which contains two billion parameters. These are internal values that allow the model to understand language and generate it. The model was trained with a dataset of 4 trillion tokens to compensate for its low precision weights. This is roughly equivalent to the content of 33 million books. This extensive training allows BitNet’s performance to be on par – or even better in some cases – than other leading models. Examples include Meta’s Llama 3.0 1B, Google Gemma 3.0 1B, and Alibaba Qwen 2.5 1.55B.
BitNet b1.58 exhibited strong performance in benchmark tests. This included grade-school math questions and questions that required common sense reasoning. In some evaluations, BitNet even outperformed their competitors.
The memory efficiency of BitNet is what really sets it apart. The model only requires 400MB of RAM, which is less than one-third of what other models require. It can run on standard CPUs such as Apple’s M2without the need for high-end GPUs.
The custom software framework bitnet.cpp is responsible for this level of efficiency. It is optimized to fully take advantage of the modelās ternaryweights. The framework provides fast and lightweight performance for everyday computing devices.
The frameworkcan be found on GitHub. It is optimized for CPUs but future updates will include support for other processor types.
The idea to reduce model precision in order to save memory is not new. Researchers have been exploring model compression for years. Most past attempts have involved converting full precision models after training. This often came at the expense of accuracy. BitNet b1.58 is a new approach. It is trained from scratch using only three weights (-1, 0 and +1). This allows it avoid many of performance losses that were seen in older methods. This shift has important implications. Large AI models require powerful hardware and a lot of energy. This increases costs and impacts the environment. BitNet uses less energy because it relies on simple calculations – mainly additions, rather than multiplications. Microsoft researchers estimate that it uses up to 96 per cent less energy than comparable models with full precision. This could allow advanced AI to be run directly on mobile devices without the need for supercomputers in the cloud.
BitNet b1.58 is not without its limitations. It currently only supports specific hardware and requires a custom bitnet.cpp frame work. Its context window, which is the amount of text that it can process simultaneously, is smaller than the most advanced models.
Researchers continue to investigate why the model performs well with a simplified architecture. Future work will expand its capabilities to include support for more languages, and longer text inputs.