Zuoyu Zhang, Yancheng Zhu

Enhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarios

Zuoyu Zhang, Yancheng Zhu / May 7, 2026

arXiv:2605.03242v1 Announce Type: new
Abstract: Tool-using agent systems powered by large language models (LLMs) are increasingly deployed across web, app, operating-system, and transactional environments. Yet existing safety benchmarks still emphasiz…

Author name: Zuoyu Zhang, Yancheng Zhu

Enhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarios