Kubernetes CronJobs silently fail more than you think

1 / 2

Kubernetes CronJobs silently fail more than you think

DEV Community·Kriss·27 days ago

#82XLXQj8

#kubernetes #devops #monitoring #cronjob #fullscreen #missed

Reading 0:00

15s threshold

A backup job missed 24 days of runs. Nobody knew. The CronJob looked fine in kubectl get cronjobs . No alerts fired. The last successful run timestamp in the status field just sat there, quietly getting older. The root cause: the CronJob controller had silently given up scheduling after missing 100 runs. Logged an error. Stopped trying. Moved on. This article explains why Kubernetes CronJobs are structurally unreliable without external monitoring, and what you can do about it. The three failure modes Kubernetes won't tell you about 1. The 100 missed-schedule limit This is the one that produces the war stories. The Kubernetes CronJob controller checks how many schedules it missed since the last successful run. If that number exceeds 100, it permanently stops scheduling that CronJob — and logs a single error line: Cannot determine if job needs to be started: too many missed start time (> 100) Enter fullscreen mode Exit fullscreen mode That's it. No event. No alert.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Kubernetes CronJobs silently fail more than you think