This is Part 2 of a two-part technical write-up. Part 1 ended with the key architecture lesson: A smaller KV cache is not automatically a faster attention path. That pushed us toward compressed-attention execution instead of storage-only cache compression. We built a stable compressed-key baseline. It was not fast enough to be the final answer, but it was coherent. It gave us a way to separate the key side from the value side. This distinction matters for the rest of the post: The experiments below are not a verdict on the official TurboQuant paper or every possible fused implementation. They are a verdict on the eager value-path family we built in this transformers fork. Then the real problem became clear: even with compressed keys, the model still has to mix historical values. This post is about the experiments that tried to make that value path cheaper. None became the final answer. But each one taught something useful.…