High-Entropy Tokens as Multimodal Failure Points in Vision-Language Models
arXiv:2512.21815v2 Announce Type: replace
Abstract: Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, as a measure of model uncertainty, is highly correlated with VLM reliability. While…