Prompt injection is often discussed as a policy problem. In practice, it is a systems problem: model behavior, guardrail triggers, scoring quality, and human escalation all need to work together. This project packages those concerns into a game that developers can run locally, inspect, and extend. What We Built We built an interactive app where players try to jailbreak a protected AI agent. The app evaluates each attempt and records outcomes for analysis. The core modules: agent.py : protected agent and guardrail execution path eval.py : agent-as-judge scoring logic game.py : orchestration loop, replay, logs, leaderboard, HITL hook streamlit_app.py : operator-friendly UI for rapid attack testing Design Goals Keep the secret hidden even under adversarial prompting. Run guardrails before generation when SDK supports hooks. Produce explainable numeric scores per attempt. Capture enough telemetry for replay and regression testing.…