I built the same MVP twice. The autonomous agent wrote 4.6x more tests — none caught two stubbed …

1 / 2

I built the same MVP twice. The autonomous agent wrote 4.6x more tests — none caught two stubbed core methods.

DEV Community·Lutz Leonhardt·25 days ago

#8JBacsTI

#agents #ai #automation #build #contract #keppt

Reading 0:00

15s threshold

Side-post in the keppt build-in-public series — an interlude before the Phase 1 implementation write-up lands. While building Phase 1 of keppt , I ran a side experiment. Same architecture spec, two builds. One I curated over a day with Claude Code and Codex — plan first, tasks derived, agent-implemented per task, adversarial review, iterate. One I handed off to Factory.ai's Droid in their new Mission Control mode. Spec in, walk away, come back when the budget runs out. Here is what came back. the numbers Curated (Claude Code + Codex) Autonomous (Droid Mission Control) Wall-clock effort ~1 day ~4 hours autonomous Source LOC 1,367 2,370 Test LOC 1,317 6,015 Test cases 69 339 Working CLI? yes no LocalFileRepository.edit() real, CAS + audit stub, ignores edits LocalFileRepository.search() real full-text + scope stub, returns [] Path-safety vectors 8 13 The autonomous build wrote 4.6× more test LOC than the curated build, with roughly five times as many test cases.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

I built the same MVP twice. The autonomous agent wrote 4.6x more tests — none caught two stubbed core methods.