Menu

Post image 1
Post image 2
1 / 2
0

How multi-provider LLM routers silently fail

DEV Community·eleata team·25 days ago
#yu0Y9r6N
#how#python#ai#llm#status#quota
Reading 0:00
15s threshold

How multi-provider LLM routers silently fail A failure mode common to several Python LLM routers: a 429 caused by an exhausted long-period quota is treated identically to a 429 caused by a transient per-minute rate limit. The cooldown TTL ends up applied to both, and one of the two cases is wrong by orders of magnitude. This essay describes the failure mode in concrete terms and outlines the small fix. The shape of the failure Most LLM routers track a single "this provider is unhealthy until X" field per deployment. When a request fails, the router sets X = now + cooldown , and is_call_allowed() returns False until then. That works perfectly for a transient per-minute rate limit, where "cooldown" is correctly measured in seconds. It works very badly for a monthly quota cap, where the provider only resumes serving requests when its billing period rolls over — possibly weeks away.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More