on a feature where I had to transform 100 messy compliance pdfs into structured JSON rules. The brute force approach was obvious: give the agent the source text, explain the task, provide examples, and ask it to generate the rules. Since it was the lowest-hanging fruit, I tried it first. At a glance, the output looked fine. The output JSON was valid and matched what I expected. But as I was manually sampling the results to check for accuracy, the cracks appeared. Some rules were too broad, others were missed. Some rules failed to preserve the nuances of the original text. I tried using another agent to catch and fix the errors but with such a huge corpus, it was impossible to confidently verify the output. That was the frustrating part. The errors were not obvious. This was way too fragile of an implementation to scale. Though I cannot share the exact implementation details, what I can share are the architectural lessons I learnt and how I eventually implemented it.…