GPU container images are the softest target in your infrastructure. A typical vLLM image is 15GB with hundreds of packages, a CUDA runtime, Python dependencies, and model weights. Most teams build these images once, push them, and never scan them again. That's a problem. I've been building GPU infrastructure tools on Docker and Kubernetes for the past year — keda-gpu-scaler for autoscaling, otel-gpu-receiver for observability, and GPU NUMA topology scheduling for Volcano . Every one of these ships as a Docker container. This post walks through the zero-trust pipeline I use to build, scan, sign, and deploy GPU containers — from docker build to production.…