Two separate monitoring failures on the same day, same root cause. Both fixed by answering a single question: "Am I testing for health, or am I testing for perfect conditions?" The distinction matters because perfect conditions are temporary, and health is structural. And once you see the pattern once, you see it everywhere. Context: production on a shared VPS The Braves stack runs on Contabo (24 GiB RAM, 6 CPUs). Five Docker stacks share that hardware: Braves (frontend, backend, pybaseball), Plane (13 containers), Twenty (5 containers), Umami (3 containers), and ntfy (1 container). 25 containers total. Single ingress: Caddy reverse proxy. Single disk. When one stack's load spikes, all five feel it. This architecture means healthchecks and deployment validators are sensitive to global state, not just stack-local state. A healthcheck that works under isolated test conditions can fail when the VPS is under collective load.…