I keep running into this problem. Every tool claims critical vuln detection. Every scanner has a hero case study. Every AI audit product shows a nice report.
But for a dev team trying to decide what to add before an audit — what's the real basis for comparison?
Too often: reputation + vibes + better landing page.
We need public benchmarking. Test everyone on identical cases.
EVMBench is the best reference I've found. What benchmarks are you using internally?