Not in any textbook — learned this from a 3am page:

1 / 2

Not in any textbook — learned this from a 3am page:

DEV Community·Neeraja Khanapure·26 days ago

#amr1KAZI

#sre #devops #kubernetes #terraform #alert #alerts

Reading 0:00

15s threshold

LinkedIn Draft — Workflow (2026-05-07) Not in any textbook — learned this from a 3am page: On-call burnout is an alert design problem, not a schedule problem Every team I've seen fight burnout by rotating people faster. The actual fix is almost always the same: the alerts are wrong. Alert quality spectrum: Noisy ◀─────────────────────────── ▶ Actionable [cpu > 80%] [pod restart] [error budget burn] [customer impact] │ │ │ │ ignore me maybe? investigate! wake me up Enter fullscreen mode Exit fullscreen mode Where it breaks: ▸ Alerts without a named owner and a runbook produce paralysis, not action — especially at 2am. ▸ Flapping alerts are the fastest path to alert blindness — engineers learn to dismiss pages before reading them. ▸ Cause-based alerts (disk full) and symptom-based alerts (latency spike) need different urgency and routing. The rule I keep coming back to: → Before any alert ships: Who acts on it? What do they do? What's the cost of 30 minutes of inaction?…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Not in any textbook — learned this from a 3am page: