LinkedIn Draft — Workflow (2026-05-07) Not in any textbook — learned this from a 3am page: On-call burnout is an alert design problem, not a schedule problem Every team I've seen fight burnout by rotating people faster. The actual fix is almost always the same: the alerts are wrong. Alert quality spectrum: Noisy ◀─────────────────────────── ▶ Actionable [cpu > 80%] [pod restart] [error budget burn] [customer impact] │ │ │ │ ignore me maybe? investigate! wake me up Enter fullscreen mode Exit fullscreen mode Where it breaks: ▸ Alerts without a named owner and a runbook produce paralysis, not action — especially at 2am. ▸ Flapping alerts are the fastest path to alert blindness — engineers learn to dismiss pages before reading them. ▸ Cause-based alerts (disk full) and symptom-based alerts (latency spike) need different urgency and routing. The rule I keep coming back to: → Before any alert ships: Who acts on it? What do they do? What's the cost of 30 minutes of inaction?…