AI security monitoring at scale: one LLM call, every dashboard

📰

AI security monitoring at scale: one LLM call, every dashboard

DEV Community·Heath Mcintyre·about 1 month ago

#ai #openai #architecture #scan #fullscreen #every

Reading 0:00

15s threshold

How CoinHawk runs a continuous AI security scan for every connected user using a single shared LLM call every 5 minutes. The dumb version doesn't scale Imagine you want every user's dashboard to display a live "security score" produced by an LLM. The first instinct is: GET /api/security/scan → call OpenAI → return result Enter fullscreen mode Exit fullscreen mode If 1,000 users hit the dashboard, you make 1,000 LLM calls. At ~$0.02 per call you've burned $20 in 30 seconds, OpenAI rate-limits you, and your p95 latency is now whatever GPT feels like today. The second instinct is to cache the response per-user. That's a little better, but you still scale costs linearly with users, and the cache invalidation logic gets ugly fast. The real answer for monitoring-style features is dead simple: the scan is global, so make exactly one global scan and serve it to everyone. Below is the production pattern I shipped in CoinHawk's "Sentinel" feature. The mental model The data is shared, not user-specific.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

AI security monitoring at scale: one LLM call, every dashboard