Last month I was debugging a prompt template for a vLLM inference service. The change was two lines — swap the system prompt and adjust the temperature. To test it, I had to: Rebuild a 15GB Docker image (the CUDA base alone is 3.5GB) Push it to our registry (8 minutes on a good day) Wait for Kubernetes to pull it on a GPU node Realize the prompt still wasn't right Repeat Total cycle time: 22 minutes per iteration. For a two-line text change. Then I tried Docker Model Runner. Pull the model once. Run inference locally. Iterate on the prompt in seconds. Push only when it's right. The same change took 14 seconds . Docker shipped two features this year that I think every GPU/AI engineer needs to know about: Model Runner and Sandboxes . This post is the walkthrough I wish I had when I started using them. My background: I build GPU infrastructure tools — keda-gpu-scaler for GPU autoscaling on Kubernetes, otel-gpu-receiver for GPU observability, and I contributed GPU NUMA topology scheduling to CNCF Volcano .…