Menu

Post image 1
Post image 2
1 / 2
0

Adaptive Compression in Inverted Indexes: What Actually Happens Inside Lucene, Elasticsearch, and Tantivy

DEV Community·우병수·24 days ago
#iqMcGDqa
#surprise#why#how#three#index#codec
Reading 0:00
15s threshold

TL;DR: The thing that broke my mental model first wasn't slow queries — it was watching disk I/O climb to 95% utilization on NVMe drives while average query latency jumped from 12ms to 340ms on a corpus I'd carefully tuned for months. We were running Elasticsearch 8. 📖 Reading time: ~41 min What's in this article The Problem I Kept Running Into: Index Bloat at Scale Quick Primer: What an Inverted Index Actually Stores The Core Algorithms You'll Actually Encounter How Lucene 9.x Actually Picks a Codec Elasticsearch 8.x: Configuring Compression in Practice Apache Solr: Where the Controls Are More Exposed Tantivy (Rust): A Different Approach Worth Knowing Benchmarking Compression Tradeoffs: What I Actually Measured The Problem I Kept Running Into: Index Bloat at Scale The thing that broke my mental model first wasn't slow queries — it was watching disk I/O climb to 95% utilization on NVMe drives while average query latency jumped from 12ms to 340ms on a corpus I'd carefully tuned for months.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More