Can LLMs Audit Smart Contracts? Benchmarking Claude Opus 4.7, GPT-5.5, and Gemini 3.1 Pro

1 / 7

Can LLMs Audit Smart Contracts? Benchmarking Claude Opus 4.7, GPT-5.5, and Gemini 3.1 Pro

DEV Community·Fahriddin·26 days ago

#sRc1AYFo

#software #coding #development #model #claude #contract

Reading 0:00

15s threshold

I gave 56 known-vulnerable Solidity smart contracts to three frontier LLMs — Claude Opus 4.7, GPT-5.5, and Gemini 3.1 Pro — and asked each one to find the bugs. 168 API calls, ~$5, and a couple of surprises later, here is what the data says. Claude finds the most bugs (98.2%). GPT-5.5 localizes them most precisely (92.9% strict recall). Gemini sits in the middle at 89.3% — but only after I caught a benchmarking gotcha that was silently costing it 20 points. This article walks through how the experiment was run, what the numbers actually mean, and why "which model is the best auditor" depends entirely on what you are optimizing for. Why This Question Matters DeFi protocols hold tens of billions of dollars in user funds, and every dollar of it is backed by open-source, public, and immutable smart contract code once deployed. A professional audit costs upwards of $50,000 per contract from a reputable firm and takes weeks to complete.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Can LLMs Audit Smart Contracts? Benchmarking Claude Opus 4.7, GPT-5.5, and Gemini 3.1 Pro