Building a fast LLM gateway in Go: Lua + pgvector

1 / 3

Building a fast LLM gateway in Go: Lua + pgvector

DEV Community: redis·Mushfiq Rahman·3 days ago

#QVKIF8VI

#dev #redis #fullscreen #cache #gateway #article

Reading 0:00

15s threshold

I open-sourced llm0 recently — a Go binary that puts one OpenAI-compatible endpoint in front of OpenAI, Anthropic, Gemini, and local Ollama. MIT licensed. Single binary plus Postgres + Redis. The technically interesting bits are how it stays fast: 3 ms p50 cache-hit latency, ~1,672 req/s sustained throughput, 1–2 Redis round trips on the hot path on a DigitalOcean 4 vCPU / 8 GB shared Linux droplet. This post walks through the architecture decisions that got those numbers. Expect Lua scripts, a pgvector query, and an honest discussion of where I overstated things and got corrected by a Redis engineer. The naive approach (and why it's slow) A typical LLM gateway request needs to do six things: Authenticate the API key Check the per-API-key rate limit Check the per-project spend cap Look up exact-match cache (Maybe) check semantic cache (Maybe) forward to the upstream model The naive approach is six serial Redis GETs.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Building a fast LLM gateway in Go: Lua + pgvector