TL;DR AI agents tend to overestimate the quality of their own outputs when there is no external verification criterion. In subjective tasks (design, writing, UX, naming, strategy), simply asking the model to "reflect" is not enough: it often remains trapped in the same trajectory that produced the first plausible solution, leading to weak critiques and superficial improvements. Achieving real quality requires designing the runtime around the model: tests, rubrics, separate evaluators, external tools, and generator-evaluator loops that introduce critical distance between the system that produces the output and the one that approves it. Why Internal Feedback Is Not Enough in Subjective Tasks Sometimes, when you ask a model to evaluate a response it previously generated, it will rate it as good even when it clearly is not.…