The task I used to test this I did not want to test something new. No hello-world prompt. No "generate a function" demo. I gave it a task I would normally spend 30–45 minutes on myself: `"Build a complete sales dashboard application and prepare it for deployment."` Enter fullscreen mode Exit fullscreen mode This is a task that typically requires: `Prompt → Review output → Fix errors → Re-prompt → Adjust → Fix again → Finalise` Enter fullscreen mode Exit fullscreen mode You stay in the loop the entire time. That is the normal pattern with Claude, GPT or any standard AI coding assistant. The model responds step-by-step. You guide every step. I hit enter. Watched it start. Then I closed my laptop and did not check back. What I expected vs what I found When I came back, I expected one of three outcomes: `` > 1. Half-finished output with obvious gaps > 2. Broken code requiring significant fixes > 3.…