Menu

Post image 1
Post image 2
Post image 3
Post image 4
1 / 4
0

The Closed-Loop Budget Brake: How a $5k Daily Cap Stopped 2 A.M. Compute Runaways

DEV Community·Muskan·18 days ago
#gTg9HYOs
#closed#loop#budget#brake#spend#runaway
Reading 0:00
15s threshold

The 2 a.m. compute runaway is the canonical FinOps incident. A Spark job is misconfigured to provision new EMR nodes every minute it cannot find a leader. A test agent left running on a developer's laptop loops infinite Claude calls against the prod API key. An autoscaling group's max gets bumped from 20 to 2000 in a Terraform plan that nobody reviewed at the right line number. Everything is asleep. The hourly spend goes from $63 to $830 to $4,200. By 9 a.m. the team gets a Slack ping from finance asking why yesterday's bill spiked $47,000. AWS Budgets fires a soft alert when daily spend crosses a threshold. The alert goes to an SNS topic that emails a distribution list and pings a Slack channel. Nobody reads the channel at 2 a.m. The on-call engineer is paged for production outages, not budget overages. By the time someone sees the alert, the damage is hours old and the runaway has either burned itself out or kept running because the alert did not actually stop anything.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More