A few weeks ago, I shipped the first reference implementation of a small specification I'd been working on. Eight YAML fields, a SHA-256 hash, and the rule that the hash gets computed before the experiment runs. The point was modest: if you're going to publish an ML accuracy claim, you ought to be able to prove that the threshold you wrote down was the threshold you committed to, not the one you settled on after seeing the test set. The spec is called PRML — Pre-Registered ML Manifest. The Python reference implementation took a weekend. Then I made the mistake of writing it in JavaScript. Then in Go. Then in Rust. Each one found a different bug. Not in the implementations. In the spec.…