Daniel Zhu, Zihan Wang, Jenny Bao, Jerry Wei

Jailbroken Frontier Models Retain Their Capabilities

Daniel Zhu, Zihan Wang, Jenny Bao, Jerry Wei / May 4, 2026

arXiv:2605.00267v1 Announce Type: cross
Abstract: As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a “jailbreak tax” that degrade…

Author name: Daniel Zhu, Zihan Wang, Jenny Bao, Jerry Wei

Jailbroken Frontier Models Retain Their Capabilities