The moment everything with my AI agents fell apart and how I solved the infrastructure issue, not the model one. Press enter or click to view image in full size Even if I feel quite ashamed, I need to confess something. Three months ago, I sat staring at my Claude usage bill and I swear I felt my soul leave my body. I hit the token limits every single day, for one month. And it was only me! This was a huge defeat for me: I am really a fan of local AI and llama.cpp and for the first time I was trying to see what would have been of my productivity using a top tier LLM. And if this happened to me, I cannot even imagine if I had to work on a team project in my Company… That’s when I knew: something had gone terribly wrong with how I’d set up my MCP infrastructure. And there is also an additional embarrassing part: I tried to connect only eight MCP servers. It was a test, an experiment, my way to understand what these damn MCPs are and if they are really useful . But eight: that’s supposed to be simple, right?…