TurboQuant: A First-Principles Walkthrough Compressing AI vectors to 2–4 bits per number without losing accuracy. Modern language models store large tables of high-dimensional vectors: KV caches, embeddings, attention keys. TurboQuant compresses each coordinate of these vectors to 2–4 bits with provably near-optimal distortion, no memory overhead for scale factors, and no training or calibration. This page explains how it works. The single load-bearing idea: in high dimensions, a random rotation turns every input vector into one whose coordinates follow a known fixed distribution. A codebook designed once for that distribution can then be reused for every input. Everything else on this page is the construction that puts this observation to work. §0 · Primer: jargon decoder Eight ideas the rest of the page is built on. Each mini-demo below covers one concept used later. Skip the ones you already know. §0.1 · Vector A list of numbers. An arrow in space. A vector is an ordered list: [0.3, −1.2] .…