Most teams think GPU scheduling starts with the scheduler. It starts with demand modeling. By the time Volcano, Kueue, or KEDA enters the conversation, the expensive mistake has usually already been made. The cluster was provisioned against a theoretical peak that rarely materializes. The demand curve was never drawn. The concurrency profile was assumed rather than measured. The core argument: GPU scheduling is not a capacity solution. It is a capacity enforcement layer. If you provisioned against the wrong demand curve, the scheduler cannot save you. The Demand Model Preflight Before you talk about schedulers, answer four questions: 1. What is your real concurrency floor? Not peak theoretical demand. The minimum sustained parallel work your cluster must support without queue collapse. If you cannot answer this from measurement, you don't have a demand model — you have an assumption. 2. What is burst, and what is noise?…