We Published Our AI Guardrail's 37% Detection Rate. Here's What We Learned.

1 / 2

We Published Our AI Guardrail's 37% Detection Rate. Here's What We Learned.

DEV Community·nagasatish chilakamarti·27 days ago

#nqZdIbzI

#ai #security #detection #tealtiger #attack #garak

Reading 0:00

15s threshold

The numbers We ran NVIDIA's Garak red team scanner against TealTiger, our open-source governance engine for AI agents. Results: Benchmark Score Garak jailbreak 40% detection Garak prompt injection 40% detection Garak data leakage 6.7% detection PINT precision 85.7% PINT recall 40% PINT F1 54.5% Not great. We published them anyway. Why publish bad numbers? Because TealTiger's core claim is deterministic, auditable governance . If we can't be transparent about our own detection capabilities, why would anyone trust us to provide transparency for their AI agents? The 85.7% precision is actually good — when we say DENY, we're almost always right. The problem is recall: we miss 60% of attacks. What we learned from the 44 missed probes We analyzed every probe that bypassed our guardrails.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

We Published Our AI Guardrail's 37% Detection Rate. Here's What We Learned.