Menu

Post image 1
Post image 2
1 / 2
0

The Postmortem of a 20-Minute Kafka 3.8 Outage That Delayed 1M Order Messages

DEV Community·ANKUSH CHOUDHARY JOHAL·about 1 month ago
#WI8oYFGS
#outage#code#kafka#why#broker#producer
Reading 0:00
15s threshold

On October 12, 2024, a misconfigured Kafka 3.8 broker idempotent write setting caused a 20-minute total outage that delayed 1,047,892 order messages, cost $42k in SLA penalties, and exposed a critical gap in our rolling upgrade validation pipeline. 📡 Hacker News Top Stories Right Now LLMs consistently pick resumes they generate over ones by humans or other models (34 points) How fast is a macOS VM, and how small could it be? (144 points) Why does it take so long to release black fan versions? (497 points) Barman – Backup and Recovery Manager for PostgreSQL (31 points) Refusal in Language Models Is Mediated by a Single Direction (21 points) Key Insights Kafka 3.8’s default idempotent producer retry backoff of 100ms combined with a 30-second broker failover window caused 89% of in-flight messages to expire before redelivery Kafka 3.8.1 (released 14 days post-outage) patches the default retry.backoff.ms to 500ms and adds upgrade pre-flight checks for idempotent write compatibility Implementing the fix reduced…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More