Disclosure and context. A small team running a backend SaaS application (PHP/ReactJS, moderate-sized codebase, no connection to Sentry) was running its own evaluation of four AI code reviewers: CodeRabbit, Sentry Seer, Greptile, and Cursor BugBot. They asked for help on the data side. I built the ingester, captured the comments, and crunched the numbers. The conclusions below are the team's, drawn from the data. I work at Sentry, so one of the four reviewers is my employer's product. All four ran in the default configuration their onboarding wizard sets up; no custom rules, no vendor outreach. How the data lands: Greptile: zero false positives across 120 findings, ~92% bug-shaped, largest precise top-tier pool (51 P1 findings, 40 solo). Leads on precision and signal density. CodeRabbit: highest volume (281 findings), 68.3% one-click diff coverage. Leads on breadth and applyability. Seer: 6/6 perfect at critical (the only reviewer in the dataset to use that label). Holds the strictest-label sub-claim.…