cs.CV, cs.SD

A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning

arXiv:2604.03995v1 Announce Type: new
Abstract: As audio-visual multi-modal large language models (MLLMs) are increasingly deployed in safety-critical applications, understanding their vulnerabilities is crucial. To this end, we introduce Multi-Modal …