The most dangerous sentence in AI delivery is: "It is done." That sentence is not evidence. AI can write confidently. A summary can look complete. A PR description can be polished. None of that proves the work is actually complete. A project-specific AI delivery pipeline should redefine "done" as an evidence question: what reviewable proof supports each acceptance criterion? That is the evidence contract. Tests matter, but they are not everything Tests are one of the most important forms of evidence. They are not the only form. A backend function fix may be covered by unit and integration tests. A frontend interaction change may also need screenshots or a recording. A data-link fix may need API output, logs, read-only SQL, or queue observation. A SketchUp modeling tool may need a design model diff, bridge trace, top-view screenshot, and live bridge smoke.…