Menu

Post image 1
Post image 2
Post image 3
Post image 4
Post image 5
Post image 6
Post image 7
Post image 8
Post image 9
Post image 10
Post image 11
Post image 12
Post image 13
Post image 14
Post image 15
Post image 16
Post image 17
Post image 18
Post image 19
1 / 19
0

How AI Gateway runs on Fluid compute

Vercel News·Malte Ubl·4 days ago
#Yikrt2Ei
#vercel#gateway#fluid#provider#model#photo
Reading 0:00
15s threshold

AI Gateway is a Node.js service for connecting to hundreds of AI models through a single interface. It processes billions of tokens per day. The secret behind that scale is Fluid. When we announced its general availability , we highlighted how AI Gateway scales efficiently, routes requests securely, and simplifies connecting to multiple AI providers. We looked at data from the first month of availability. AI Gateway handled roughly 16,000 total runtime hours, but only 1,200 of those hours involved actual CPU work (processing requests, routing logic, streaming responses). The remaining 14,800 hours were spent waiting for AI providers to respond. Traditional serverless platforms bill you for wall clock time. Every millisecond your function is alive, you pay. With Fluid and Active CPU Pricing, you only pay CPU rates when the CPU is actually running. The rest of the time (when AI Gateway is waiting on OpenAI or Anthropic) you pay a lower memory-only rate.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More