Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps
arXiv:2604.19533v2 Announce Type: replace-cross
Abstract: We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windo…