AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor
arXiv:2601.05752v3 Announce Type: replace
Abstract: We introduce AutoMonitor-Bench, the first benchmark designed to systematically evaluate the reliability of LLM-based misbehavior monitors across diverse tasks and failure modes. AutoMonitor-Bench con…