At 00:25 UTC on the morning of May 8, one availability zone of one region of one cloud provider began to fail in a structurally interesting way. The AWS Health Dashboard describes the cause with admirable composure: a thermal event. The site of the thermal event is use1-az4 , an availability zone in the company's Northern Virginia us-east-1 region — a region that is, in The Register's preferred adjective , notorious. The hardware is off. The customer workloads that were running on the hardware are off. The services that are nominally global, but which happen to thread their control plane through us-east-1, are degraded. The dashboard's own description: "EC2 instances and EBS volumes hosted on impacted hardware are affected by the loss of power during the thermal event." And the second sentence: "Other AWS services that depend on the affected EC2 instances and EBS volumes in this Availability Zone may also experience impairments." That second sentence is doing more work than it would like to be doing.…