When I started working on medical AI infrastructure at a university hospital, the first real question wasn't about models or FHIR or GPU scheduling. It was simpler: how do you structure a system that multiple research teams can use, without it turning into a nightmare to maintain? That question led me to rethink how medical AI actually gets deployed in practice. The Monolithic Approach - and why it breaks The most common starting point is a monolithic setup: one codebase, one container, one pipeline. You take your model, wrap it in a FastAPI endpoint, dockerize it, and ship it. It works. Until a second team wants to plug in a different model. Suddenly you're touching the core codebase for every new AI service. Deployments become risky. The team maintaining the core is now a bottleneck for everyone else. And nobody is happy. In a hospital research environment, this happens faster than you'd expect.…