cs.CV, cs.LG

High-Entropy Tokens as Multimodal Failure Points in Vision-Language Models

arXiv:2512.21815v2 Announce Type: replace
Abstract: Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, as a measure of model uncertainty, is highly correlated with VLM reliability. While…