BareBones: Benchmarking Zero-Shot Geometric Comprehension in VLMs
arXiv:2604.10528v2 Announce Type: replace
Abstract: While Vision-Language Models (VLMs) demonstrate remarkable zero-shot recognition capabilities across a diverse spectrum of multimodal tasks, it yet remains an open question whether these architecture…