I Increased Retrieval From Top-5 to Top-20. My Answers Got Worse

1 / 2

I Increased Retrieval From Top-5 to Top-20. My Answers Got Worse

DEV Community·Md Ayan Arshad·25 days ago

#1OaHd2kU

#result #ai #programming #reranker #condition #pool

Reading 0:00

15s threshold

The standard advice for improving RAG retrieval quality is: retrieve more candidates, then filter down. Bigger pool, better reranker, better answers. I followed that advice in my RAG System . On PDFs, going from top-5 to top-20 made my RAGAS scores drop. The answers got worse, not better. Here's what actually happened and the experiment design that explained it. TL;DR PDFs (40 QA pairs, 5 technical documents): Condition RAGAS SUM Context Precision top-5, no reranker (baseline) 3.4330 0.8102 top-20, no reranker 3.4051 ↓ 0.8118 top-20 → Cohere rerank → top-5 3.4843 ↑ 0.8368 GitHub code (50 QA pairs, encode/httpx repo): Condition RAGAS SUM Context Precision top-5, no reranker (baseline) 3.5680 0.7812 top-20, no reranker 3.5766 0.7812 ← identical top-20 → Cohere rerank → top-5 3.7079 ↑ 0.9335 On PDFs, more candidates without a quality filter made scores drop. On code, a 4x larger pool produced zero improvement in Context Precision i.e. 0.7812 versus 0.7812.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

I Increased Retrieval From Top-5 to Top-20. My Answers Got Worse