Multi-Tenant Token Budgets: Quota Patterns That Don't Starve Your Best Customers

1 / 3

Multi-Tenant Token Budgets: Quota Patterns That Don't Starve Your Best Customers

DEV Community·Gabriel Anhaia·25 days ago

#3EL9BdWV

#pattern #observability #ai #llm #tenant #self

Reading 0:00

15s threshold

Book: LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub You ship a multi-tenant SaaS with an AI feature. One Anthropic key fans out to every customer (swap in OpenAI, Bedrock, Vertex; the shape is the same). On Tuesday morning a single tenant burns through your minute-bucket. Usually it is the one running a backfill nobody told you about. The whole platform starts returning 429s. Your enterprise customer's CEO demo at 10 AM hits the rate limit two minutes in, because a free-tier tenant is replaying their support inbox through your summarizer. A team I talked to had a worse version of this. They added "20 requests per minute per tenant" the first time it happened.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Multi-Tenant Token Budgets: Quota Patterns That Don't Starve Your Best Customers