Adaptive Compression in Inverted Indexes: What Actually Happens Inside Lucene, Elasticsearch, and…

1 / 2

Adaptive Compression in Inverted Indexes: What Actually Happens Inside Lucene, Elasticsearch, and Tantivy

DEV Community·우병수·24 days ago

#iqMcGDqa

#surprise #why #how #three #index #codec

Reading 0:00

15s threshold

TL;DR: The thing that broke my mental model first wasn't slow queries — it was watching disk I/O climb to 95% utilization on NVMe drives while average query latency jumped from 12ms to 340ms on a corpus I'd carefully tuned for months. We were running Elasticsearch 8. 📖 Reading time: ~41 min What's in this article The Problem I Kept Running Into: Index Bloat at Scale Quick Primer: What an Inverted Index Actually Stores The Core Algorithms You'll Actually Encounter How Lucene 9.x Actually Picks a Codec Elasticsearch 8.x: Configuring Compression in Practice Apache Solr: Where the Controls Are More Exposed Tantivy (Rust): A Different Approach Worth Knowing Benchmarking Compression Tradeoffs: What I Actually Measured The Problem I Kept Running Into: Index Bloat at Scale The thing that broke my mental model first wasn't slow queries — it was watching disk I/O climb to 95% utilization on NVMe drives while average query latency jumped from 12ms to 340ms on a corpus I'd carefully tuned for months.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Adaptive Compression in Inverted Indexes: What Actually Happens Inside Lucene, Elasticsearch, and Tantivy