Anthropic's $200 Experiment: How AI Success Rate Jumped From 20% to 100% With a Harness

1 / 2

Anthropic's $200 Experiment: How AI Success Rate Jumped From 20% to 100% With a Harness

DEV Community: python·TengLongAI2026·2 days ago

#S7svSygj

#dev #harness #success #agents #agent #opus

Reading 0:00

15s threshold

Summary Anthropic ran a controlled experiment: Opus 4.5 solo ($9) = 20% success . Add a Harness (5 subsystems) = 100% success at $200. OpenAI confirmed with a million-line repo: one AGENTS.md file changed everything. Stop swapping models. Build your harness first. The Experiment Config Cost Success Rate Opus 4.5 solo $9 20% Opus 4.5 + Harness $200 100% The $191 premium was all verification loops: compile, test, lint, type check. The 5 Harness Subsystems Subsystem What It Prevents Instructions Agent doesn't know project conventions Tools Unauthorized operations, accidental deletes Environment "Works on my machine" syndrome State Cross-session amnesia Feedback Premature victory declarations The 3 Fatal Failure Modes Premature Victory — Agent writes 500 lines, declares "done", CI goes red. Fix : Pre-commit hook: npx tsc --noEmit Context Amnesia — Agent adds feature but breaks existing one. Fix : MEMORY.md — read previous state before acting. Tool Abuse — Agent runs destructive commands without asking.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Anthropic's $200 Experiment: How AI Success Rate Jumped From 20% to 100% With a Harness