that working with AI in production is pretty expensive. We all know this and we know most vendors are working pretty hard to figure out how to make agents cheaper. This is why I thought it was a good idea to go through a few design principles to keep in mind when you’re building, which can help you understand where you can grab some savings. We’ll go through how prompt caching works and why it’s a quick win, semantic caching, lazy-loading tools and MCPs, routing and cascading, delegating to subagents, and a bit on keeping the context clean. I am including interactive graphs throughout this article — that helps you visualize the cost savings each principle can get you based on the amount of tokens you are using. Yes, I am obviously staying real throughout, every saving comes with trade-offs. Agents get expensive as the context grows Your first agent might ship with a 500-token system prompt and two tools, but once it grows up, those numbers balloon fast.…