fastest way to understand what RAG is is to build the smallest version that actually works, run it on a real document, and look closely at what just happened. That’s this article. About a hundred lines of Python (no vector database, no framework, no agents) running on the Attention Is All You Need paper (Vaswani et al. 2017; arXiv non-exclusive distribution license, declared on the arXiv abstract page ), returning a sourced answer with the exact source lines highlighted on the page. Then we walk back through each block and ask the question it naturally raises. Each question is what a later article develops. The minimal pipeline is the smallest amount of code that respects the four bricks and produces a verifiable answer. Every later article adds capability the team needs after a specific failure on real documents, not because the architecture needed more layers. 1. What we’re building The pipeline has four bricks (Part II goes into each one in detail) plus a final, optional rendering step.…