cs.AI, cs.CR, cs.SE

SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents

arXiv:2604.12040v1 Announce Type: cross
Abstract: We present SIR-Bench, a benchmark of 794 test cases for evaluating autonomous security incident response agents that distinguishes genuine forensic investigation from alert parroting. Derived from 129 …