Long Context vs RAG: When to Load the Whole Book

1 / 2

Long Context vs RAG: When to Load the Whole Book

DEV Community·Jeff Reese·about 1 month ago

#8dTjAhPk

#when #ai #rag #context #long #tokens

Reading 0:00

15s threshold

AI in Practice, No Fluff — Day 9/10 I have a project where every conversation and decision gets saved as a journal entry. Hundreds of entries, accumulated over weeks. When I need context from a previous session, I have two options: load every single entry into the AI's context window and ask my question, or use the embedding-based search from yesterday's post to retrieve just the relevant entries and pass only those in. Both work and each has their tradeoffs. The choice between them is one of the most important architectural decisions in AI applications right now. In the first series, we covered context windows (there is always a limit) and RAG (retrieve relevant information before generating a response). Today is where those two concepts collide. Context windows have gotten dramatically larger since that series. The question is no longer "can the AI hold all of this?" It often can. The question is whether it should. The context window got big Less than a year ago, 200,000 tokens was considered large.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Long Context vs RAG: When to Load the Whole Book