Incident Report: API Crash Due to Kubernetes 1.36 Node OOM and KEDA 2.15 Scaling Flaw Executive Summary On May 15, 2024, our production REST API experienced a total outage lasting 47 minutes, triggered by a cascade of failures involving a Kubernetes 1.36 node out-of-memory (OOM) event and a latent scaling flaw in KEDA 2.15. The incident resulted in 12,432 failed API requests, 100% traffic unavailability for the duration of the outage, and impact to 8 enterprise customers. No data loss occurred, and the incident consumed 0.03% of our monthly 99.9% SLA error budget. Incident Timeline (All Times UTC) 14:22 : Monitoring alerts triggered for elevated API 5xx error rates, initial spike of 12% failed requests. 14:24 : On-call engineering team confirmed total API unavailability and initiated incident response protocol. 14:26 : Initial investigation identified kubelet OOM kill events on 3 of 5 worker nodes running Kubernetes 1.36.…