Tales from the Bare Metal, Episode 01 « Thou shalt not trust a backup thou hast not restored! » At half past eleven on the night of Tuesday, 31 January 2017, an engineer at GitLab.com typed rm -rf on what they believed was the secondary PostgreSQL database. The terminals on their screen were visually identical, save for the hostname in the prompt. Two seconds later, when they realised the prompt did not say what they thought it said, they killed the command. By that point, three hundred gigabytes of production data had been removed. That was the easy part. The hard part came over the next eighteen hours, as the team discovered, in a sequence that has since become teaching material, that none of their five backup mechanisms had been working. This is a long-documented incident. GitLab's response, by industry standards, was extraordinary. They live-streamed the recovery on YouTube. They published their internal chat logs.…