Built a prompt injection proxy that beats OpenAI Moderation and LlamaGuard — see it block attacks…

📰

Built a prompt injection proxy that beats OpenAI Moderation and LlamaGuard — see it block attacks live

Reddit r/artificial·u/Turbulent-Tap6723·about 1 month ago

#prompt #openai #gate #recall #built #article

Reading 0:00

15s threshold

Built a prompt injection proxy that beats OpenAI Moderation and LlamaGuard — see it block attacks live Built Arc Gate — sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Try it here — no signup, no code, no setup: https://web-production-6e47f.up.railway.app/try Type any prompt and see if it gets blocked or passes. The examples on the page show the difference. The main detection layer is a behavioral SVM on sentence-transformer embeddings — catches semantic intent, not just pattern matches. Phrase matching is just the fast first pass. Four layers total. Benchmarked on 40 OOD prompts (indirect, roleplay, hypothetical framings — the hard stuff): • Arc Gate: Recall 0.90, F1 0.947 • OpenAI Moderation: Recall 0.75, F1 0.86 • LlamaGuard 3 8B: Recall 0.55, F1 0.71 Zero false positives on benign prompts including security discussions and safe roleplay. Block latency 329ms.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Built a prompt injection proxy that beats OpenAI Moderation and LlamaGuard — see it block attacks live