Embeddings on the Edge: sentence-transformers vs Hosted APIs

1 / 3

Embeddings on the Edge: sentence-transformers vs Hosted APIs

DEV Community·Gabriel Anhaia·27 days ago

#rqREolYu

#where #rag #embeddings #ai #embedding #large

Reading 0:00

15s threshold

Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub A team I talked to last quarter was paying around eleven thousand dollars a month, by their account, to embed product reviews on text-embedding-3-small . Roughly two hundred million chunks, refreshed weekly. Their on-call engineer ran a spike on BGE-large-en-v1.5 with text-embeddings-inference on a single H100. He came back two days later. Same recall on their eval set, as he told it. Approximately seventy dollars a day in GPU time on a spot instance. The same week, a friend at a four-person startup did the opposite migration.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Embeddings on the Edge: sentence-transformers vs Hosted APIs