We built a public red team environment for our AI agent security proxy — submit attacks and get a full security trace back

Live adversarial evaluation: https://web-production-6e47f.up.railway.app/break-arc-gate

Arc Gate is a runtime governance layer for LLM agents. It sits between your app and the OpenAI API and enforces instruction-authority boundaries — tracking who is allowed to instruct the agent and from what source. Webpages, emails, tool outputs, and retrieved documents have zero instruction authority.
Submit any attack. Every submission runs against the real proxy and returns a full decision trace, risk score, capability policy, and downloadable JSON report. Confirmed bypasses get documented publicly and patched in the next release.

GitHub: https://github.com/9hannahnine-jpg/arc-gate
Reproducible benchmark: pip install arc-sentry && arc-sentry-agent-bench

Current results: 100% unsafe action prevention across 22 agentic scenarios, 0% false positive rate on benign developer traffic.

submitted by /u/Turbulent-Tap6723
[link] [comments]

Leave a Comment