Semantic Richness or Geometric Reasoning? The Fragility of VLM’s Visual Invariance
arXiv:2604.01848v2 Announce Type: replace
Abstract: This work investigates the fundamental fragility of state-of-the-art Vision-Language Models (VLMs) under basic geometric transformations. While modern VLMs excel at semantic tasks such as recognizing…