Originally published at llmkube.com/blog/turboquant-m5-max-quality-and-asymmetric . Cross-posted here for the dev.to audience. Yesterday's M5 Max KV cache post drew a clean set of asks in the comments: where are the perplexity numbers, what about KL divergence, did you try asymmetric K/V combos, can you fill the 32K to 128K gap with a 64K row. I ran them overnight on the same hardware. Numbers below. TL;DR q8_0 KV cache is essentially free at 4k context. PPL delta vs f16 is −0.0005 (well inside the ±0.036 stderr). KL is 0.0016. Top-1 token agreement is 98.64%. turbo3 and turbo4 cost real but small quality. turbo3: ~1% PPL increase, 5pp top-token disagreement, KL roughly 12× q8_0. turbo4 sits between, in line with its lower compression ratio. -ctk q8_0 -ctv turbo4 is the new winner for long-context. Matches symmetric q8_0 throughput at every depth tested and fits 512K, where symmetric q8_0 OOM'd. q8_0-grade prefill, turbo4-grade memory ceiling. -ctk f16 -ctv turbo4 is broken on this fork on Metal.…