"Your retries are killing us." A service team received this message from a downstream dependency during an outage. The upstream API was timing out, so naturally, the client retried. 3 times, 5 times, 10 times. The client thought it was doing the right thing. From the dependency's perspective, they were at half capacity due to the outage — and receiving several times the normal traffic. Retries were making the outage worse and preventing recovery. This isn't a fable. In August 2012, Knight Capital's trading system activated legacy code (Power Peg) during a deployment, generating millions of orders over 45 minutes. Orders were never marked as "complete," so the system kept regenerating them. The feedback loop never closed. The structural result: an infinite re-execution loop with the same dynamics as a retry storm. $440 million lost, company effectively bankrupt. Retries exist to survive failures. But when designed carelessly, retries become the failure.…