# Introduction TurboQuant is a novel algorithmic suite and library recently launched by Google. Its goal is to apply advanced quantization and compression to large language models (LLMs) and vector search engines β indispensable elements of retrieval-augmented generation (RAG) systems β to improve their efficiency drastically. TurboQuant has been shown to successfully reduce cache memory consumption down to just 3 bits, without requiring model retraining or sacrificing accuracy. How does it do that, and is it really worth the hype? This article aims to answer these questions through a description and practical example of its use.β¦