We added a npm run ilb:flagship:smoke gate to the quality script. It's small: for each flagship rule with a labeled corpus, run the rule against vulnerable/* (must fire) and safe/* (must stay silent). Compute precision, recall, F1. Fail the build below F1=1.00. The first run hit nine rules. Six passed. Three failed. Rule Result What broke react-features/hooks-exhaustive-deps P=67% R=100% F1=0.80 False positive on the standard .then((r) => r.json()) pattern mongodb-security/no-unsafe-query P=100% R=50% F1=0.67 Missed $where injection via template-literal interpolation vercel-ai-security/no-unsafe-output-handling P=— R=0% F1=— Found nothing in const { text } = await generateText(...); el.innerHTML = text All three rules had passing unit-test suites. All three had been benchmarked alongside peer plugins on real OSS for weeks. None of those signals would have surfaced these bugs.…