Menu

Post image 1
Post image 2
1 / 2
0

Not in any textbook — learned this from a 3am page:

DEV Community·Neeraja Khanapure·about 1 month ago
#JCN7Cxtn
#kubernetes#devops#sre#terraform#pods#ready
Reading 0:00
15s threshold

LinkedIn Draft — Workflow (2026-04-28) Not in any textbook — learned this from a 3am page: Kubernetes rollouts: why 'pods are Ready' is the wrong promotion gate Readiness is a node-local signal. Production health is a global one. Most rollout pipelines conflate the two — and that's where incidents come from. Bad gate: Good gate: Deploy ──▶ Pods Ready? ──▶ Done Deploy ──▶ Pods Ready? (local signal) │ ▼ SLO window check (error rate + p95) │ Pass ──▶ Promote Fail ──▶ Auto-rollback Enter fullscreen mode Exit fullscreen mode Where it breaks: ▸ 100% Ready pods while P95 latency spikes — bad cache warmup, noisy neighbor, DB connection saturation. ▸ HPA reacts slower than a fast rollout — you ship overload before autoscaling catches up. ▸ Canary stuck green because metrics lack the right labels/slices to isolate the failing segment. The rule I keep coming back to: → Promote only when the canary holds your SLO slice (error rate + latency) for a fixed observation window. Otherwise: auto-rollback.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More