Companion code: MukundaKatta/ragvitals-gemma-demo . The pgai + Ollama path is the demo/pgai_ollama_run.py entry point. If you have ever shipped RAG to production, you know the discovery: the day you swap a model, change an embedder, or re-index the corpus, something will move that you did not expect . Faithfulness drops. Retrieval recall sags. Query intent shifts under your nose. Most of the time you find out from a support ticket, not a dashboard. The point of this post is to show that you can build the whole pipeline on open source and still have the observability part be a first-class citizen, not an afterthought you bolt on later. The stack: pgvector for vector storage pgai for embedding and generation, run inline as SQL functions Ollama for serving Gemma 2 9B (generator), Llama 3.1 8B (judge), and Nomic Embed (embedder), all locally ragvitals for a 5-dimensional drift report over every call No managed embedding endpoint. No API key.…