TL;DR: The thing that caught me off guard wasn't the query latency — it was the storage invoice. We had a working Elasticsearch cluster, decent relevance tuning, p95 query times under 200ms. 📖 Reading time: ~36 min What's in this article The Problem Nobody Warns You About A Quick Mental Model (Not a Textbook Definition) The Actual Encoding Algorithms You'll Encounter What Elasticsearch and OpenSearch Actually Give You to Configure Hands-On: Measuring Compression Ratio Before You Change Anything Implementing a Custom Codec in Lucene (When Defaults Aren't Enough) Roaring Bitmaps: When to Reach for Them Directly The 3 Things That Surprised Me The Problem Nobody Warns You About The thing that caught me off guard wasn't the query latency — it was the storage invoice. We had a working Elasticsearch cluster, decent relevance tuning, p95 query times under 200ms. Then we crossed 100M documents and the disk bill tripled inside of two billing cycles. Not doubled. Tripled.…