My team ships roughly 40 PRs a week. About half of them are AI-assisted — Copilot, Cursor, Claude. The velocity is real. So is the chaos if you don't have a process to match it. After six months of watching AI-generated code sneak subtle bugs, over-engineered abstractions, and hallucinated API calls into production, I built a review playbook that actually scales. Here's what works. Why AI PRs Break Your Existing Review Process Traditional code review assumes the author understood the problem deeply before writing code. AI-generated code breaks that assumption in three specific ways: Confident incorrectness. The diff looks clean. Tests pass. The logic is plausible. But the AI misunderstood a subtle requirement — maybe it ignored a race condition in your async queue, or used a deprecated SDK method that still compiles. Surface-level coherence. AI output is syntactically tidy and stylistically consistent, which tricks reviewers into approving it faster.…