cs.CL, cs.SE

AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor

arXiv:2601.05752v3 Announce Type: replace
Abstract: We introduce AutoMonitor-Bench, the first benchmark designed to systematically evaluate the reliability of LLM-based misbehavior monitors across diverse tasks and failure modes. AutoMonitor-Bench con…