Jailbreaking Frontier Foundation Models Through Intention Deception
arXiv:2604.24082v1 Announce Type: cross
Abstract: Large (vision-)language models exhibit remarkable capability but remain highly susceptible to jailbreaking. Existing safety training approaches aim to have the model learn a refusal boundary between sa…