Menu

Post image 1
Post image 2
1 / 2
0

Two Retrieval Methods Are Better Than One: Evidence from 500 Clinical Queries

DEV Community·Igor Eduardo·19 days ago
#bgGc48cw
#python#rag#ai#bm25#dense#hybrid
Reading 0:00
15s threshold

When I set out to evaluate retrieval configurations for Portuguese clinical text, I expected one method to dominate. Instead, I found something more interesting: BM25 and dense retrieval solve different questions. Neither is a substitute for the other. This post summarizes the methodology and results from a 500-query empirical study of hybrid retrieval for clinical question answering. All code is open source: https://github.com/nomad-link-id/hybrid-rag-pipeline The Setup 500 clinical queries across 6 medical specialties (cardiology, endocrinology, infectology, nephrology, neurology, oncology). Each query has a single reference answer grounded in a specific passage from clinical documentation. Four retrieval configurations were evaluated: Config Method BM25-only BM25 with Portuguese stopword removal Dense-only BioBERTpt embeddings, cosine similarity Hybrid-RRF BM25 + dense via Reciprocal Rank Fusion Hybrid-Rerank RRF candidates re-ranked with cross-encoder What Is Reciprocal Rank Fusion?…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More