Menu

Post image 1
Post image 2
1 / 2
0

Postmortem: How a Bug in Google 2026's Kubernetes Engine and AWS EKS 1.35 Caused a Global Outage for 10k Customers

DEV Community·ANKUSH CHOUDHARY JOHAL·30 days ago
#QPJjNlJM
Reading 0:00
15s threshold

Postmortem: How a Bug in Google Kubernetes Engine 2026 and AWS EKS 1.35 Caused a Global Outage for 10k Customers On March 15, 2026, a shared upstream Kubernetes regression in version 1.35 triggered a cascading global outage across Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS), impacting 10,432 customers over 4 hours and 22 minutes. This postmortem details the timeline, root cause, impact, and remediation of the incident. Executive Summary The outage stemmed from a race condition in the Kubernetes v1.35 PodDisruptionBudget (PDB) admission controller, which incorrectly rejected all pod eviction requests during node maintenance, even when PDB constraints were satisfied. Both GKE and EKS had adopted K8s v1.35 as their default or rolling release in early March 2026, leading to synchronized failures across both managed Kubernetes providers.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More