Why I used three different critic roles instead of one (and what the eval taught me)

1 / 2

Why I used three different critic roles instead of one (and what the eval taught me)

DEV Community: ai·Bohyeon Jang·1 day ago

#0Xd3z68i

#dev #critic #three #different #model #adjudicator

Reading 0:00

15s threshold

Why I used three different critic roles instead of one (and what the eval taught me) I built Crucible over a weekend: three specialized critic agents that audit any LLM output in parallel, an adjudicator that synthesizes their critiques into a confidence-scored verdict, and an eval harness that measures whether the whole thing actually works better than just asking a single model to check itself. Here is what I learned, including the part where the honest answer is "not as much as I hoped." The problem: a model cannot reliably audit its own blind spots When a language model generates output, it has already committed to a direction. Ask it to self-review and it will often ratify the same confident mistake it just made, not because it is lazy, but because self-review activates the same internal heuristics that produced the error.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Why I used three different critic roles instead of one (and what the eval taught me)