I've been running a multi-agent LangGraph pipeline for a few weeks now and kept hitting the same two walls: redundant API calls inflating costs, and agent crashes the moment any upstream service became flaky. My setup had 4 agents calling overlapping tools. Under real load, I was making 40+ calls per run to endpoints that returned the same data within a 30-minute window. My circuit-breaker logic was copy-pasted and inconsistent across tools. Observability was basically nonexistent — when something broke, I had no structured signal to debug from. I found ToolOps and the core idea is sound: treat every tool call the way infrastructure engineers treat microservice communication. The @readonly / @sideeffect decorator split is opinionated in a good way — it forces you to be explicit about whether a tool call is idempotent or not, which turns out to matter a lot when you're deciding what's safe to cache and retry.…