I’ve been running GPU inference workloads for about two years now and for most of that time I had the same problem: every time I wanted to move a workload to a different provider, I was essentially starting from scratch on the deployment config. Not because the actual workload changed. The code was the same, the container was the same. But all the infrastructure glue — the scheduling constraints, the node selectors, the provider-specific API calls, the health check logic — was baked into the config in ways that assumed a specific provider’s environment. Moving meant unpicking all of that and rebuilding it for wherever we were going. I tried a few things to fix this. Terraform helped with provisioning but didn’t solve the actual problem. I could terraform my way to nodes on a different provider. I still had to tell each workload where to run and update that when things changed. I tried writing an abstraction layer that sat between our deployment scripts and the provider APIs. That worked for a while.…