Menu

Post image 1
Post image 2
1 / 2
0

Not in any textbook — learned this from a 3am page:

DEV Community·Neeraja Khanapure·26 days ago
#amr1KAZI
#sre#devops#kubernetes#terraform#alert#alerts
Reading 0:00
15s threshold

LinkedIn Draft — Workflow (2026-05-07) Not in any textbook — learned this from a 3am page: On-call burnout is an alert design problem, not a schedule problem Every team I've seen fight burnout by rotating people faster. The actual fix is almost always the same: the alerts are wrong. Alert quality spectrum: Noisy ◀─────────────────────────── ▶ Actionable [cpu > 80%] [pod restart] [error budget burn] [customer impact] │ │ │ │ ignore me maybe? investigate! wake me up Enter fullscreen mode Exit fullscreen mode Where it breaks: ▸ Alerts without a named owner and a runbook produce paralysis, not action — especially at 2am. ▸ Flapping alerts are the fastest path to alert blindness — engineers learn to dismiss pages before reading them. ▸ Cause-based alerts (disk full) and symptom-based alerts (latency spike) need different urgency and routing. The rule I keep coming back to: → Before any alert ships: Who acts on it? What do they do? What's the cost of 30 minutes of inaction?…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More