This is a problem I spent longer on than I should have. We run a mixed model stack. DeepSeek V3 for the cost-sensitive tasks, Qwen 2.5 for multilingual, GPT-4o for the things that actually need it. On paper this sounds fine. In practice managing the API layer for all of these became a part-time job I didn’t sign up for. Separate credentials for each provider. Separate rate limit handling. Separate billing accounts to reconcile at the end of the month. Separate integrations that break independently when providers push updates. And Chinese model providers in particular have a different update cadence than western ones — I’ve had DeepSeek change something without warning twice in six months. The thing I kept running into was that solutions that worked well for western models didn’t work as well for Chinese ones. The latency going through certain routing layers was noticeably higher for DeepSeek and Qwen than for GPT and Claude. Pricing at the volume we’re running them wasn’t competitive either.…