Menu

📰
0

TurboQuant: A First-Principles Walkthrough

arkaung.github.io·@HashtagPLUS·about 1 month ago
#HsZbacdF
Reading 0:00
15s threshold

TurboQuant: A First-Principles Walkthrough Compressing AI vectors to  2–4 bits per number without losing accuracy. Modern language models store large tables of high-dimensional vectors: KV caches, embeddings, attention keys. TurboQuant compresses each coordinate of these vectors to 2–4 bits with provably near-optimal distortion, no memory overhead for scale factors, and no training or calibration. This page explains how it works. The single load-bearing idea: in high dimensions, a random rotation turns every input vector into one whose coordinates follow a known fixed distribution. A codebook designed once for that distribution can then be reused for every input. Everything else on this page is the construction that puts this observation to work. §0 · Primer: jargon decoder Eight ideas the rest of the page is built on. Each mini-demo below covers one concept used later. Skip the ones you already know. §0.1 · Vector A list of numbers. An arrow in space. A vector is an ordered list: [0.3, −1.2] .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More