Postmortem: A Corrupted Loki 2.10 Log Store Caused 3 Days of Lost Debug Data Overview On October 12, 2024, at 09:15 UTC, our observability team detected anomalies in the Loki 2.10 log store cluster serving production debug data. By the time mitigation steps were fully effective, 3 full days of debug log data (October 12–14) had been corrupted and was unrecoverable, impacting incident response and root cause analysis for 14 concurrent production incidents. Impact The corrupted log store affected all production services that write debug-level logs to the Loki 2.10 cluster, including: 142 microservices across 3 cloud regions CI/CD pipeline debug logs for 217 failed deployments Customer support ticket debug traces for 892 high-priority tickets Total unrecoverable data loss: ~4.2 TB of debug logs. No customer-facing services were disrupted, as debug logs are non-critical for production runtime, but engineering velocity dropped by 62% due to lack of debug context for incident resolution.…