"It Feels Off" Is Not a Diagnosis You've deployed a RAG system. Users are saying the answers "aren't quite right." So you tweak the Prompt — feels a bit better. Then you switch Embedding models — better again. After a few rounds of this, you have no idea which change actually helped, and the next time it breaks, you're back to square one. This is the most common trap in RAG engineering: tuning by intuition without quantified diagnosis . The previous article built an evaluation framework with RAGAS and explained the 4 core metrics. This article turns those 4 metrics into a diagnostic toolkit — by deliberately inducing 3 classic failure modes, we can use data to pinpoint root causes instead of guessing. The Core Diagnostic Approach: A Decision Tree When a RAG system gives poor answers, the root cause falls into one of two categories: retrieval failed or generation failed .…