(Image source: Google Blog - Gemini API File Search is now multimodal: build efficient, verifiable RAG ) Recap: RAG Finally Doesn't Need to Build Legos In the past few years, whenever developers thought about RAG (Retrieval-Augmented Generation), the component list that came to mind probably looked like this: A chunker (langchain? Write it yourself?) An embedding model (OpenAI text-embedding-3? Cohere? BGE?) A vector database (ChromaDB, FAISS, pgvector, Pinecone… which one to choose is a battle) A retrieval + rerank process And then the LLM Not to mention that multimodal RAG needs another layer: How to embed images? Do you need to OCR first? Do you need to split two stores, one for text and one for images? How to calculate scores for mixed text and image search? Just these few questions can take up a sprint. Recently, Google released Expanded Gemini API File Search for multimodal RAG on the developer blog, turning the long pipeline above into " calling a managed API ", and images are natively supported .…