Like most teams using GitHub Actions, we’d gotten used to the ritual: push code, wait for CI, see a red build, re-run it, hope it passes this time. “It’s probably flaky” became the default response to any test failure — including real ones. We decided to actually measure the damage. Over 30 days on a single repo: 842 CI runs → 117 failures (13.9% failure rate) 31.5 developer hours spent investigating and re-running $426 in CI compute burned on re-runs that shouldn’t have been needed 1 regression shipped to production because a real failure was dismissed as “just flaky” The worst part? Nobody could tell us which tests were flaky. We had a vague sense — “that login test is weird” — but no actual inventory. And without an inventory, you can’t fix what you can’t see.…