Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models
arXiv:2603.21697v2 Announce Type: replace-cross
Abstract: Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually grounded instructions. We study comic-template j…