2026-05-14 9 min read At Cloudflare, we are heavy users of ClickHouse, an open source online analytical processing (OLAP) database. Every day, we make millions of calls to ClickHouse to determine how much users should be billed for their usage of Cloudflare products. If we don't finish those jobs in a timely fashion, the invoices become very difficult to reconcile. This pipeline powers hundreds of millions of dollars in usage revenue, fraud systems, and more, so being delayed has major downstream implications. Which is why it was a big problem when the daily aggregation jobs in ClickHouse – responsible for ensuring Cloudflare’s bills go out – had slowed way down, following a migration. All the usual suspects looked clean: I/O, memory, rows scanned, parts read. Everything we would normally check when a ClickHouse query is slow appeared to be normal. This is the story of how we discovered a hidden bottleneck buried deep within ClickHouse’s internals, and the three patches we wrote to fix it.…