I wanted to answer one question: After packed-codebook TurboQuant failed, was there still a credible latency path? The short answer: there was a real speed ceiling, but no stable quality-preserving implementation path. TL;DR Hardware-friendly int4 K/V passed byte gates but failed real-KV logit quality. Qwen2.5-7B work reduction had a real speed ceiling: p_attn=0.334 , with 1.20x to 1.21x projected speedup at 5% selector overhead. Oracle quality failed anyway: no implementable selector passed all 4 decode steps. The lesson was strict: a speed ceiling is only permission to run a quality gate, not permission to implement.…