ran this problem into the ground before finding something that works. writing it up because most content i found either covered prototyping scenarios or didn’t account for what happens at real production volume. the actual problem running DeepSeek V3 for cost-sensitive tasks, Qwen 2.5 for multilingual, GPT-4o for the things that need it. three providers, three sets of credentials, three rate limit systems, three billing accounts, three integrations that break on independent schedules. tried building a DIY routing layer. worked fine until DeepSeek pushed an API update on a Friday. spent the weekend fixing an integration that had nothing to do with our actual product. this happened twice. tried routing everything through aggregator tools. for DeepSeek and Qwen specifically the latency was higher than acceptable for our use case and the per-token pricing at our call volume was not competitive. Chinese models felt like an afterthought in the routing logic. what i ended up on Yotta Labs.…