Key Takeaways Generic SaaS RFPs do not fit AI SRE. The category is younger than most procurement templates and the failure modes (hallucinated root causes, model drift, signal-type sensitivity) are not covered by traditional vendor checklists. Investigation quality is measurable. The RCAEval benchmark (Pham et al., December 2024, published at ACM Web Conference 2025 Companion Proceedings ) provides 735 fault-injection cases across three microservice systems with 11 fault types and 15 reproducible baselines. The NOFire AI benchmark extends this with a signal-type ladder showing Top-1 accuracy rises from 29 percent on metrics-only inputs to 77 percent when logs are added, 87 percent when traces are added, and 89 percent on full multi-modal telemetry with agentic reasoning. Trust is a separate axis from capability.…