Been hitting a consistent problem across several deployments: LLM agents operate fine in testing but fail compliance review because there's no traceable decision log. The typical RAG setup gives you an answer and a source chunk. That's not enough for a healthcare or financial audit β the auditor wants to know which rule applied, what data it ran against, and a source citation they can verify independently. Approaches I've seen tried: - LangSmith / Langfuse tracing (good for debugging, not audit-grade provenance) - Custom logging middleware (works but becomes a maintenance burden fast) - GraphRAG (better structured recall, still no rule-level accountability) What I ended up doing was separating the reasoning layer entirely β a forward-chaining rule engine that evaluates YAML policies against a structured context graph, and writes W3C PROV-O provenance per answer. The PROV-O output is what actually satisfies compliance teams. Interested in what others have found.β¦