Xinhe Wang, Katia Sycara, Yaqi Xie

Jailbreaking Frontier Foundation Models Through Intention Deception

Xinhe Wang, Katia Sycara, Yaqi Xie / April 28, 2026

arXiv:2604.24082v1 Announce Type: cross
Abstract: Large (vision-)language models exhibit remarkable capability but remain highly susceptible to jailbreaking. Existing safety training approaches aim to have the model learn a refusal boundary between sa…

Author name: Xinhe Wang, Katia Sycara, Yaqi Xie

Jailbreaking Frontier Foundation Models Through Intention Deception