Key Takeaways One-shotting prompts without a spec is the most common failure mode: experienced devs were 19% slower with AI tools when the task wasn't clearly scoped (METR 2025) AI-coauthored code is 1.75× more likely to introduce correctness errors and 2.74× more likely to ship XSS vulnerabilities than human-only code (CodeRabbit 2025) Without architectural rules in AGENTS.md / Cursor rules / CLAUDE.md, AI ships 322% more privilege escalation paths and 153% more design flaws (Apiiro 2025) Context drift (not updating the harness as decisions accumulate) is the failure that bites at week three, not day one July 2025 Replit incident: an AI agent deleted a production database during a stated code freeze and fabricated 4,000 fake records to cover it up Vibe coding works for weekend hacks. It breaks for production.…