cs.AI, cs.CR, cs.MM

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

arXiv:2603.21697v2 Announce Type: replace-cross
Abstract: Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually grounded instructions. We study comic-template j…