When Surfaces Lie: Exploiting Wrinkle-Induced Attention Shift to Attack Vision-Language Models
arXiv:2603.27759v3 Announce Type: replace
Abstract: Visual-Language Models (VLMs) have demonstrated exceptional cross-modal understanding across various tasks, including zero-shot classification, image captioning, and visual question answering. Howeve…