Daniel Begimher, Cristian Leo, Jack Huang, Pat Gaw, Bonan Zheng

SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents

Daniel Begimher, Cristian Leo, Jack Huang, Pat Gaw, Bonan Zheng / April 15, 2026

arXiv:2604.12040v1 Announce Type: cross
Abstract: We present SIR-Bench, a benchmark of 794 test cases for evaluating autonomous security incident response agents that distinguishes genuine forensic investigation from alert parroting. Derived from 129 …

Author name: Daniel Begimher, Cristian Leo, Jack Huang, Pat Gaw, Bonan Zheng

SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents