Originally posted on celeryradar.com . Workers are the part of Celery that actually do the work. When they stop, your application's background processing stops. That's the easy part to monitor. The harder part is that workers fail in ways that look healthy from the outside: the process is still running, the broker connection looks fine, the log file's last line is from this morning, and yet tasks aren't getting picked up. By the time somebody on your team notices, a downstream user noticed first. This guide covers what worker monitoring actually needs to catch (more than "is the process running"), why the three dominant detection approaches each have known blind spots, the five ways workers go silent in production, and the specific implementation trap that causes naive heartbeat setups to fire false alerts during recovery. Why worker death detection is harder than it looks Worker monitoring isn't underserved the way beat schedule monitoring is. Every Celery monitoring tool tracks workers in some form.…