ARMOR 2025 benchmark tests 21 LLMs against military legal doctrines, revealing critical safety gaps that civilian benchmarks miss. ARMOR 2025, a new benchmark published April 30 on arXiv, evaluates 21 commercial LLMs against military legal doctrines. It reveals that existing safety benchmarks miss critical gaps in models' adherence to the Law of War and Rules of Engagement. Key facts 519 doctrinally grounded prompts in the benchmark 12-category taxonomy based on OODA framework 21 commercial LLMs evaluated Grounded in Law of War, Rules of Engagement, Joint Ethics Published on arXiv April 30, 2026 The Doctrinal Gap ARMOR 2025 targets a blind spot in LLM safety evaluation. Existing benchmarks like MMLU or TruthfulQA test general social risks, but none measure whether models follow the legal and ethical rules governing real military operations. The benchmark extracts doctrinal text from three core sources: the Law of War, the Rules of Engagement, and the Joint Ethics Regulation.…