cs.CV

Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions

arXiv:2511.17722v3 Announce Type: replace
Abstract: Recent research suggests that Vision Language Models (VLMs) often rely on inherent biases learned during training when responding to queries about visual properties of images. These biases are exacer…