Menu

Post image 1
Post image 2
1 / 2
0

My agent worked yesterday. Today it's possessed.

DEV Community·Ansh Saxena·25 days ago
#cj0IVme3
Reading 0:00
15s threshold

Two weeks of clean runs. Same prompts, same repo, same results. Then Tuesday happened. The outputs were longer. Different variable names. Tool calls you'd never seen before. You asked the agent about it. It explained confidently. The explanation sounded plausible. No stack trace. No error. No crash. Just behavior that used to be one thing and is now quietly something else. This is the hardest failure to diagnose because you have nothing to point at. You have a feeling. A feeling is not a measurement. Here's what five baseline sessions looked like: Session 1: ~1,000 tokens | tools: [search, summarize] Session 2: ~1,000 tokens | tools: [search, summarize] Session 3: ~1,100 tokens | tools: [search, summarize] Session 4: ~950 tokens | tools: [search, summarize] Session 5: ~1,050 tokens | tools: [search, summarize] Enter fullscreen mode Exit fullscreen mode Here's session 6: Session 6: 50,000 tokens | tools: [fetch_url, parse_html, extract_entities, classify, store_results] Enter fullscreen mode Exit fullscreen…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More