TL;DR I run three client projects plus an OSS repo. Agentic coding got good enough that for a lot of tasks I just hand over a goal and a list of acceptance criteria and let it run. The catch: Opus 4.7's reliability made me nervous enough that I started ending almost every task with a manual round of "now have Codex look at this." That ritual happened often enough that I automated it. Galley is the result: a local runtime where Claude Code executes a task inside a git worktree, then a supervisor (Claude, or Codex when I want a different reviewer) checks the run evidence against the acceptance criteria and either bounces it back for another attempt or approves it and opens a PR. The parts I'm happiest with aren't the loop itself.…