Kushal Agrawal, Frank Xiao, Guido Bergman, Asa Cooper Stickland

Why Do Language Model Agents Whistleblow?

Kushal Agrawal, Frank Xiao, Guido Bergman, Asa Cooper Stickland / April 24, 2026

arXiv:2511.17085v3 Announce Type: replace-cross
Abstract: The deployment of Large Language Models (LLMs) as tool-using agents causes their alignment training to manifest in new ways. Recent work finds that language models can use tools in ways that co…

Author name: Kushal Agrawal, Frank Xiao, Guido Bergman, Asa Cooper Stickland

Why Do Language Model Agents Whistleblow?