How I Tuned Adaptive Compression for Inverted Indexes and Stopped Wasting 40% of My Disk

1 / 2

How I Tuned Adaptive Compression for Inverted Indexes and Stopped Wasting 40% of My Disk

DEV Community·우병수·22 days ago

#6mj1cGjq

#whats #roaring #tantivy #why #index #codec

Reading 0:00

15s threshold

TL;DR: The thing that caught me off guard wasn't the query latency — it was the storage invoice. We had a working Elasticsearch cluster, decent relevance tuning, p95 query times under 200ms. 📖 Reading time: ~36 min What's in this article The Problem Nobody Warns You About A Quick Mental Model (Not a Textbook Definition) The Actual Encoding Algorithms You'll Encounter What Elasticsearch and OpenSearch Actually Give You to Configure Hands-On: Measuring Compression Ratio Before You Change Anything Implementing a Custom Codec in Lucene (When Defaults Aren't Enough) Roaring Bitmaps: When to Reach for Them Directly The 3 Things That Surprised Me The Problem Nobody Warns You About The thing that caught me off guard wasn't the query latency — it was the storage invoice. We had a working Elasticsearch cluster, decent relevance tuning, p95 query times under 200ms. Then we crossed 100M documents and the disk bill tripled inside of two billing cycles. Not doubled. Tripled.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

How I Tuned Adaptive Compression for Inverted Indexes and Stopped Wasting 40% of My Disk