cs.AI, cs.CV, cs.LG

Jailbreaking Vision-Language Models Through the Visual Modality

arXiv:2605.00583v1 Announce Type: cross
Abstract: The visual modality of vision-language models (VLMs) is an underexplored attack surface for bypassing safety alignment. We introduce four jailbreak attacks exploiting the vision component: (1) encoding…