TL;DR RAG retrieved the right document. The LLM still contradicted it. That is the failure this system catches. Five failure patterns: numeric contradictions, fake citations, negation flips, answer drift, confident-but-ungrounded responses. Three healing strategies fix bad answers in-place before users see them. No external APIs, no LLM judge, no embeddings model — pure Python under 50ms. 70 tests, every production failure mode I found has a named assertion. was lying (why I built this) I’m building a RAG-powered assistant for EmiTechLogic , my tech education platform. The goal is simple: a learner asks a question, the system pulls from my tutorials and articles, and answers based on that content. The LLM output should not be generic. It should reflect my content, my explanations, what I’ve actually written. Before putting that in front of real learners, I needed to test it properly. What I found was not what I expected. The retrieval was working fine. The right document was coming back.…