On‑policy distillation has become the go‑to recipe for squeezing a large language model’s capabilities into a smaller student after training. The process, however, inherits a hidden bias: the student learns to mimic a teacher that has access to privileged context, and it consequently reports confidence scores that are far too optimistic. Recent work shows that this optimism can be tamed without giving up the accuracy gains that distillation promises. Traditional OPD treats the teacher’s probability token as both a signal of what to say and how sure to be. Because the teacher’s confidence is conditioned on information unavailable at deployment, the student ends up with a systematic “certainty illusion.” The paper formalizes this mismatch as a scaling law of miscalibration, arguing that privileged context collapses entropy and drives optimism — the same mechanism that makes the student’s logits sharper than they should be. CaOPD rewrites that recipe.…