Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

AWS Just Took Half the Internet Down Because a Building Got Too Hot

DEV Community·Arthur·24 days ago
#hT03oqgd
#aws#outage#useast1#customer#region#thermal
Reading 0:00
15s threshold

At 00:25 UTC on the morning of May 8, one availability zone of one region of one cloud provider began to fail in a structurally interesting way. The AWS Health Dashboard describes the cause with admirable composure: a thermal event. The site of the thermal event is use1-az4 , an availability zone in the company's Northern Virginia us-east-1 region — a region that is, in The Register's preferred adjective , notorious. The hardware is off. The customer workloads that were running on the hardware are off. The services that are nominally global, but which happen to thread their control plane through us-east-1, are degraded. The dashboard's own description: "EC2 instances and EBS volumes hosted on impacted hardware are affected by the loss of power during the thermal event." And the second sentence: "Other AWS services that depend on the affected EC2 instances and EBS volumes in this Availability Zone may also experience impairments." That second sentence is doing more work than it would like to be doing.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More