Orchestration Allows Microservices to Be Unreliable (And That's a Good Thing) One of the first features I wanted to build for Kubernetes was service workflows: Service A starts, then B, then C. If B fails, A should know, and C shouldn't panic. Services need to know when their dependencies are ready. On a whiteboard, that sounded trivial. In production, it's a nightmare. Is a service healthy if the container starts but blocks on I/O? What if the probe returns "OK" while the API stalls for ten seconds? What if a job needs two dependencies and only one appears? These edge cases turn start-up ordering into a distributed minefield. I showed the plan to Brian Grant, who had lived through every permutation of failure inside Google. He shook his head: "You're solving the wrong problem. Treat startup quirks as just another failure mode. Build for failure, full stop." That was the first time I internalized a hard truth: all containers, all nodes, all networks fail—often and in weird ways.…