We run a mixed GPU inference stack — H100s, H200s, RTX 5090s depending on availability and cost at any given time. For about a year, every time we wanted to shift workloads between providers we were effectively rebuilding deployment configs from scratch. Not because the workloads changed. Because the configs were hardcoded to one provider’s infrastructure. This is the actual GPU vendor lock-in problem and it took us embarrassingly long to name it correctly. What we thought the problem was We thought we were locked in because of which provider we were on. So we focused on making it easier to switch providers — Terraform for infrastructure provisioning, containerized workloads, documented migration runbooks. This helped at the infrastructure layer. It didn’t help at the workload layer. When we wanted to move a specific workload from Provider A to Provider B, we still had to update scheduling config, test on new hardware, debug provider-specific quirks, update monitoring.…