Building the foundation for running extra-large language models

1 / 6

Building the foundation for running extra-large language models

The Cloudflare Blog·Michelle ChenKevin FlansburgVlad Krasnov·about 1 month ago

#6oV5g0xM

#ai #developers #developerplatform #agentsweek #tokens #model

Reading 0:00

15s threshold

2026-04-16 8 min read An agent needs to be powered by a large language model. A few weeks ago, we announced that Workers AI is officially entering the arena for hosting large open-source models like Moonshot’s Kimi K2.5. Since then, we’ve made Kimi K2.5 3x faster and have more model additions in-flight. These models have been the backbone of a lot of the agentic products, harnesses, and tools that we have been launching this week.  Hosting AI models is an interesting challenge: it requires a delicate balance between software and very, very expensive hardware. At Cloudflare, we’re good at squeezing every bit of efficiency out of our hardware through clever software engineering. This is a deep dive on how we’re laying the foundation to run extra-large language models. Hardware configurations As we mentioned in our previous Kimi K2.5 blog post , we’re using a variety of hardware configurations in order to best serve models.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Building the foundation for running extra-large language models