How CoinHawk runs a continuous AI security scan for every connected user using a single shared LLM call every 5 minutes. The dumb version doesn't scale Imagine you want every user's dashboard to display a live "security score" produced by an LLM. The first instinct is: GET /api/security/scan β call OpenAI β return result Enter fullscreen mode Exit fullscreen mode If 1,000 users hit the dashboard, you make 1,000 LLM calls. At ~$0.02 per call you've burned $20 in 30 seconds, OpenAI rate-limits you, and your p95 latency is now whatever GPT feels like today. The second instinct is to cache the response per-user. That's a little better, but you still scale costs linearly with users, and the cache invalidation logic gets ugly fast. The real answer for monitoring-style features is dead simple: the scan is global, so make exactly one global scan and serve it to everyone. Below is the production pattern I shipped in CoinHawk's "Sentinel" feature. The mental model The data is shared, not user-specific.β¦