cs.CV

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation

arXiv:2504.14988v3 Announce Type: replace
Abstract: Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated remarkable multimodal perception capabilities, garnering significant attention. While numerous evaluation studies have em…