Have you ever told an AI 'never do this' and watched it do it anyway?

1 / 4

Have you ever told an AI 'never do this' and watched it do it anyway?

DEV Community·John Dreic·23 days ago

#Ni44R4x6

#ai #agents #showdev #discuss #agent #refund

Reading 0:00

15s threshold

I have. And the worst part is, the AI thought it was being helpful. I'd been worried about this for a while. Whenever you give an AI agent a tool that can write to something — your billing system, your customer database, whatever — the safety story usually goes: “I told it the rules in the system prompt. I gave it 16 rules. It'll be fine.” It's mostly fine. Until it isn't. The thing that bothers me about prompt rules — the rules you write into the system prompt — is that they're aspirational. The agent reads them. The agent intends to follow them. Most of the time it does. But the agent is the same thing being asked to write the response. There's no separation between the thing making the decision and the thing checking the decision. It's the AI grading its own homework. So a few days ago I built two versions of the same agent and stress-tested them. Both were Refund Approvers — they take a refund request, look up the transaction, and write a row into a refund history table.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Have you ever told an AI 'never do this' and watched it do it anyway?