You're doing RAG wrong

1 / 7

You're doing RAG wrong

DEV Community·Venkata Manideep Patibandla·24 days ago

#q8HBKKSk

#ai #tutorial #rag #llm #retrieval #vector

Reading 0:00

15s threshold

There's a new approach that: cuts corpus size by 40x. reduces tokens per query by 3x. improves vector search relevance by 2.3x And it doesn't touch your retrieval algorithm, your reranker, or your embedding model. It fixes something upstream that almost no one examines Every RAG pipeline starts with the same assumption: a chunk of text is the right unit of knowledge to embed That assumption is almost never examined And it's the source of most of the retrieval failures people try to fix downstream Why the Chunk Is a Bad Unit A chunk of text is a structurally neutral container. It knows nothing about: where its ideas begin or end which version of a document it came from who is allowed to see it Since a chunk has no idea boundary, the splitter cuts wherever the token count runs out. You end up retrieving half a table, or a conclusion with no argument, or a claim stripped of the context that makes it true. The model has no way to know what’s missing. The version problem is just as bad.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

You're doing RAG wrong