How AI Gateway runs on Fluid compute

1 / 19

How AI Gateway runs on Fluid compute

Vercel News·Malte Ubl·4 days ago

#Yikrt2Ei

#vercel #gateway #fluid #provider #model #photo

Reading 0:00

15s threshold

AI Gateway is a Node.js service for connecting to hundreds of AI models through a single interface. It processes billions of tokens per day. The secret behind that scale is Fluid. When we announced its general availability , we highlighted how AI Gateway scales efficiently, routes requests securely, and simplifies connecting to multiple AI providers. We looked at data from the first month of availability. AI Gateway handled roughly 16,000 total runtime hours, but only 1,200 of those hours involved actual CPU work (processing requests, routing logic, streaming responses). The remaining 14,800 hours were spent waiting for AI providers to respond. Traditional serverless platforms bill you for wall clock time. Every millisecond your function is alive, you pay. With Fluid and Active CPU Pricing, you only pay CPU rates when the CPU is actually running. The rest of the time (when AI Gateway is waiting on OpenAI or Anthropic) you pay a lower memory-only rate.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

How AI Gateway runs on Fluid compute