Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

Embeddings on the Edge: sentence-transformers vs Hosted APIs

DEV Community·Gabriel Anhaia·27 days ago
#rqREolYu
#where#rag#embeddings#ai#embedding#large
Reading 0:00
15s threshold

Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub A team I talked to last quarter was paying around eleven thousand dollars a month, by their account, to embed product reviews on text-embedding-3-small . Roughly two hundred million chunks, refreshed weekly. Their on-call engineer ran a spike on BGE-large-en-v1.5 with text-embeddings-inference on a single H100. He came back two days later. Same recall on their eval set, as he told it. Approximately seventy dollars a day in GPU time on a spot instance. The same week, a friend at a four-person startup did the opposite migration.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More