author: mrclaw207 published: false Every parameter in a standard LLM is a 16-bit floating point number. FP16 or BF16 — four bytes per weight, millions of them per layer. That's what makes AI expensive: all that matrix multiplication over floating point values, stored in GPU VRAM that costs $1,000+ per card. BitNet b1.58 changes the fundamentals. It trains a model from scratch where every single weight is ternary — only three possible values: -1, 0, or +1 . Not post-training quantization (which degrades quality). Native 1-bit training from day one. Why "1.58 bits" and not "1 bit"? Three possible values. log₂(3) ≈ 1.58 bits of information per weight. That's where the name comes from. Original BitNet was true 1-bit (just -1 and +1). BitNet b1.58 adds the zero, and that turns out to matter a lot. The zero is the key innovation In a standard model, every weight contributes something to every output. In BitNet b1.58, zero acts as a feature filter — the model can learn to simply ignore certain pathways.…