A protocol for auditing AI agent harnesses

1 / 2

A protocol for auditing AI agent harnesses

DEV Community·Evangelos Pappas·24 days ago

#cKv6dtXU

#ai #agents #harness #machinelearning #verifier #failure

Reading 0:00

15s threshold

I have been building coding agents for the last several months and watching every component I added fail to move the resolve-rate I cared about. A verifier first. Multi-candidate sampling next. A structured-output sub-agent after that. Each was justified by a specific observed failure mode and each looked cheap at the margin. None of them helped. The Tsinghua paper Natural-Language Agent Harnesses , run on SWE-bench Verified at GPT-5.4 high reasoning, explains the loss directly: a same-model verifier on top of a baseline coding agent regresses task success on OSWorld by 8.4 percentage points, and multi-candidate sampling regresses it by 5.6 the same way. Both lose for the same structural reason. The verifier and the proposer are the same model as the doer. They share its training distribution, its priors, its failure modes. When the doer is confidently wrong, the verifier endorses the wrong output with the same confidence. The check does not catch errors. It approves them. The pattern generalises.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

A protocol for auditing AI agent harnesses