War Story: Scaling Our PostgreSQL 17 Cluster to 10TB for 100M+ Users

1 / 2

War Story: Scaling Our PostgreSQL 17 Cluster to 10TB for 100M+ Users

DEV Community·ANKUSH CHOUDHARY JOHAL·about 1 month ago

#mhhtMOEB

#tip #story #scaling #postgres #postgresql #storage

Reading 0:00

15s threshold

At 03:14 UTC on November 12, 2024, our primary PostgreSQL 17 write leader hit 100% CPU, p99 write latency spiked to 11.2 seconds, and 14% of user requests for our 100M+ active user platform started failing. We had 8 minutes to fix it before the morning rush in APAC. 📡 Hacker News Top Stories Right Now Where the goblins came from (643 points) Noctua releases official 3D CAD models for its cooling fans (253 points) Zed 1.0 (1865 points) The Zig project's rationale for their anti-AI contribution policy (298 points) Mozilla's Opposition to Chrome's Prompt API (82 points) Key Insights PostgreSQL 17’s native columnar storage reduced analytical query time by 78% on 10TB datasets pgBouncer 1.22 and pg_stat_statements 1.10 were critical for connection and query tuning Monthly infrastructure costs dropped from $68k to $26k after deprecating legacy sharding logic PostgreSQL 17’s native logical replication will make cross-cloud failover 40% faster by 2025 The Incident That Started It All We had been putting off scaling…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

War Story: Scaling Our PostgreSQL 17 Cluster to 10TB for 100M+ Users