Modern LLM serving is hard to tune because each deployment is a stack of interacting choices: model backend, tensor-parallel shape, prefill/decode split, worker counts, scheduler settings, routing policy, KV cache behavior, autoscaling thresholds, and topology. Those choices interact across layers, and a local improvement can shift the bottleneck somewhere else. For larger models, even one realistic experiment can require many GPUs or nodes before we learn whether the idea was worth testing. That is the motivation for DynoSim: a Dynamo twin. DynoSim is a workload-driven discrete-event simulation of the NVIDIA Dynamo serving stack. It combines measured engine forward-pass timing, Mocker scheduler cores, Router, and Planner behavior, KV cache effects and workload traces on one virtual timeline. The goal is not a purely analytical estimate and not a bit-exact hardware emulator.…