/u/bmarti644 - Provide.ai

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

/u/bmarti644 / April 5, 2026

Anthropic's AuditBench – 56 Llama 3.3 70B models with planted hidden behaviors – their best agent detects the behaviros 10-13% of the time (42% with a super-agent aggregating many parallel runs). a central finding is the "tool-to-agent gap&quo…

Author name: /u/bmarti644

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing