Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

A 4-model adversarial review pipeline that costs $0.30 per audit

DEV Community·Mike·27 days ago
#FfaqYjro
#about#opensource#ai#claude#panel#crucible
Reading 0:00
15s threshold

TL;DR: Single-model code review has correlated blind spots. A panel of frontier models from different vendor families catches more, but only if a final verification step throws out the hallucinations the panel converged on. Crucible is a Claude Code skill that does both for about $0.30 a run. If GPT misses a bug, the runner-up GPT-class model usually misses it too. Same training data, same blind spots. The fix isn't a bigger model. It's a panel of structurally different ones. I built Crucible to test this. It's a Claude Code skill that walks a codebase file by file, runs each file through four frontier models from different vendor families, then has Claude verify the findings against the actual source before showing them to me. This post is about the orchestration tricks that made it actually work. ## The panel Four models, four families. As of writing, the default panel is: DeepSeek V4-Pro Google Gemini 3.1 Pro Moonshot Kimi K2.6 MiniMax M2.7 Family diversity matters more than raw benchmark scores.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More