A user pasted a help article into our agent. Three minutes later the agent silently rewrote a customer email, leaked an internal URL, and tried to fetch a .zip from a domain none of us had ever seen. Nothing in the LLM was wrong. The problem was upstream. Retrieved text walked into the prompt with no inspection, and the agent treated it as gospel. I wrote up the lessons as a short preprint. The two npm libs below are the working code behind it. The two libs @mukundakatta/prompt-injection-shield A small-rule scanner for prompt-injection patterns in untrusted text. No heuristics, no ML, no weights. Just regex-grade rules with a typed risk_reasons array so you can log, gate, or strip lines. npm install @mukundakatta/prompt-injection-shield Enter fullscreen mode Exit fullscreen mode import { scan } from ' @mukundakatta/prompt-injection-shield ' ; const r = scan ( retrievedDoc ); if ( r . risk_score > 0 ) { console . warn ( ' blocked: ' , r .…