Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub A team I talked to last quarter was paying around eleven thousand dollars a month, by their account, to embed product reviews on text-embedding-3-small . Roughly two hundred million chunks, refreshed weekly. Their on-call engineer ran a spike on BGE-large-en-v1.5 with text-embeddings-inference on a single H100. He came back two days later. Same recall on their eval set, as he told it. Approximately seventy dollars a day in GPU time on a spot instance. The same week, a friend at a four-person startup did the opposite migration.…