Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub You inherit a vector index. Six million chunks at 1536 dimensions on text-embedding-3-small . The HNSW graph eats around 40 GB of RAM, the pgvector instance pages constantly, and the p99 query latency drifts up every time someone bulk-imports a tenant. The bill is fine; the infra is the problem. A teammate shows you a paragraph in the OpenAI new embedding models post : the third-generation models support truncation. You can ask for 256-dim vectors directly, or take the 1536-dim vectors you already have and slice the first 256 floats off. According to the OpenAI announcement, retrieval quality on MTEB barely moves. Your index footprint drops by 6×.…