Today I want to start with a series of articles describing my experience building a multi-tenant RAG system powered by Postgres that serves over millions of documents while still delivering end-to-end responses in under 4 seconds (including the latency from AI providers). This article serves as the overview before I will start diving deeper into the several topics in the upcoming weeks. I put a lot of research into most of the steps until I reached a somewhat stable and fast system. I was heavily involved in building this at my company, but I wasn't the only one and many of the ideas came from working through problems together with the team. In case you are thinking about building a RAG-based system this series could help you make the decisions regarding architecture or provider choice. What makes a good RAG system? In my opinion a good RAG system is mainly defined by recall and latency because these two things are directly impacting the end user experience.…