Menu

#RolePlay

1 post

Feed
1 of 1 post
Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)
📰
0

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

Reddit r/artificial·u/Turbulent-Tap6723·about 1 month ago
#1zux7TsO

Benchmarked on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings. The stuff that slips past everything else.…

15s
Read More