TL;DR: Six weeks running an AI gateway between our edge cameras and three cloud VLM providers cut our pilot VLM spend by 58% and gave us actual failover during a 90-minute Anthropic blip last month. Bifrost handled it. Here's what worked, what didn't, and how it compared to LiteLLM and Portkey on the same workload. So, the thing is, when our team at a partner factory near Bologna wired up a defect inspection pilot with cloud VLMs in the loop, the cost story turned ugly within the first ten days. We had 28 stations, each catching anomalies from local event-camera and frame fusion, then escalating ambiguous frames to GPT-4o-mini or Claude Sonnet for a second opinion. The VLM bill landed at €4,800 in week one. Production was running 11 hours a day. Nobody had budgeted for that. The pilot also stalled twice. Once because OpenAI returned 429s for 22 minutes during what I assume was a regional capacity issue, and once because a key rotated wrong and half the fleet froze. Neither outage was the model's fault.…