Embedding Dimension Reduction: When 1536 256 Doesn't Hurt Recall

1 / 3

Embedding Dimension Reduction: When 1536 256 Doesn't Hurt Recall

DEV Community·Gabriel Anhaia·25 days ago

#nwMcu9dz

#when #rag #ai #embeddings #embedding #matryoshka

Reading 0:00

15s threshold

Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub You inherit a vector index. Six million chunks at 1536 dimensions on text-embedding-3-small . The HNSW graph eats around 40 GB of RAM, the pgvector instance pages constantly, and the p99 query latency drifts up every time someone bulk-imports a tenant. The bill is fine; the infra is the problem. A teammate shows you a paragraph in the OpenAI new embedding models post : the third-generation models support truncation. You can ask for 256-dim vectors directly, or take the 1536-dim vectors you already have and slice the first 256 floats off. According to the OpenAI announcement, retrieval quality on MTEB barely moves. Your index footprint drops by 6×.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Embedding Dimension Reduction: When 1536 256 Doesn't Hurt Recall