Menu

Post image 1
Post image 2
1 / 2
0

Anthropic's $200 Experiment: How AI Success Rate Jumped From 20% to 100% With a Harness

DEV Community: python·TengLongAI2026·2 days ago
#S7svSygj
#dev#harness#success#agents#agent#opus
Reading 0:00
15s threshold

Summary Anthropic ran a controlled experiment: Opus 4.5 solo ($9) = 20% success . Add a Harness (5 subsystems) = 100% success at $200. OpenAI confirmed with a million-line repo: one AGENTS.md file changed everything. Stop swapping models. Build your harness first. The Experiment Config Cost Success Rate Opus 4.5 solo $9 20% Opus 4.5 + Harness $200 100% The $191 premium was all verification loops: compile, test, lint, type check. The 5 Harness Subsystems Subsystem What It Prevents Instructions Agent doesn't know project conventions Tools Unauthorized operations, accidental deletes Environment "Works on my machine" syndrome State Cross-session amnesia Feedback Premature victory declarations The 3 Fatal Failure Modes Premature Victory — Agent writes 500 lines, declares "done", CI goes red. Fix : Pre-commit hook: npx tsc --noEmit Context Amnesia — Agent adds feature but breaks existing one. Fix : MEMORY.md — read previous state before acting. Tool Abuse — Agent runs destructive commands without asking.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More