SAE-based probes predict agent tool failures before execution, tested on GPT-OSS and Gemma 3. Adds internal observability missing from current external methods. Hariom Tatsat and Ariye Shater introduced SAE-based probes that predict agent tool failures before execution. The paper, posted to arXiv on May 7, 2026, tests on GPT-OSS 20B and Gemma 3 27B models. Key facts Posted to arXiv on May 7, 2026. Tests on GPT-OSS 20B and Gemma 3 27B models. Trained on NVIDIA Nemotron function-calling dataset. Two probes: Tool-Need and Tool-Risk (3 tiers). Uses SAEs and linear probes for pre-action inference. A new paper from researchers Hariom Tatsat and Ariye Shater, posted to arXiv on May 7, 2026, applies mechanistic interpretability to a practical problem: predicting when AI agents will misuse tools before they act.…