Menu

Post image 1
Post image 2
1 / 2
0

How I Built an AI Agent That Handles On-Call Incidents and Pauses for Human Approval Before Touching Production

DEV Community·Krishna shakula·29 days ago
#sx4Erk35
#part#ai#fullscreen#state#graph#iras
Reading 0:00
15s threshold

The Problem It's 3 AM. PagerDuty fires. You drag yourself to your laptop. Open Grafana. Squint at a spike. Switch to Kibana, filter logs, grep for errors. Cross-reference a recent deployment. Form a hypothesis. Write a Slack message explaining what you found. Wait for someone to approve your fix. Apply it. Verify it worked. Then spend an hour writing a post-mortem that goes into a folder nobody opens. You do this for every incident. Every single time. I've been that engineer. So I built IRAS an Intelligent Incident Response Agent System that handles the full first-response lifecycle automatically, and only wakes you up to press Approve. Here's the architecture, the interesting engineering problems, and the decisions I'd make again (and the ones I wouldn't).…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More