Two Retrieval Methods Are Better Than One: Evidence from 500 Clinical Queries

1 / 2

Two Retrieval Methods Are Better Than One: Evidence from 500 Clinical Queries

DEV Community·Igor Eduardo·19 days ago

#bgGc48cw

#python #rag #ai #bm25 #dense #hybrid

Reading 0:00

15s threshold

When I set out to evaluate retrieval configurations for Portuguese clinical text, I expected one method to dominate. Instead, I found something more interesting: BM25 and dense retrieval solve different questions. Neither is a substitute for the other. This post summarizes the methodology and results from a 500-query empirical study of hybrid retrieval for clinical question answering. All code is open source: https://github.com/nomad-link-id/hybrid-rag-pipeline The Setup 500 clinical queries across 6 medical specialties (cardiology, endocrinology, infectology, nephrology, neurology, oncology). Each query has a single reference answer grounded in a specific passage from clinical documentation. Four retrieval configurations were evaluated: Config Method BM25-only BM25 with Portuguese stopword removal Dense-only BioBERTpt embeddings, cosine similarity Hybrid-RRF BM25 + dense via Reciprocal Rank Fusion Hybrid-Rerank RRF candidates re-ranked with cross-encoder What Is Reciprocal Rank Fusion?…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Two Retrieval Methods Are Better Than One: Evidence from 500 Clinical Queries