Retrieval-Augmented Generation is now the default architecture for adding knowledge to LLM-powered applications. When a user asks a question, you retrieve relevant context from your own data, pass that context to the model alongside the question, and the model answers based on what you gave it. Most RAG tutorials reach for LangChain immediately. This guide skips the framework and builds the pipeline from scratch: pgvector for vector storage, the OpenAI Python SDK for embeddings and generation, and psycopg for the database connection. Originally published at rivestack.io Database Setup CREATE EXTENSION IF NOT EXISTS vector ; CREATE TABLE documents ( id BIGSERIAL PRIMARY KEY , title TEXT NOT NULL , source TEXT ); CREATE TABLE chunks ( id BIGSERIAL PRIMARY KEY , document_id BIGINT REFERENCES documents ( id ) ON DELETE CASCADE , content TEXT NOT NULL , token_count INTEGER , embedding VECTOR ( 1536 ) ); CREATE INDEX ON chunks USING hnsw ( embedding vector_cosine_ops ) WITH ( m = 16 , ef_construction = 64 ); Enter…