Detecting Safety Violations Across Many Agent Traces
arXiv:2604.11806v1 Announce Type: cross
Abstract: To identify safety violations, auditors often search over large sets of agent traces. This search is difficult because failures are often rare, complex, and sometimes even adversarially hidden and only…