cs.AI, cs.CV

Can Multimodal Large Language Models Truly Understand Small Objects?

arXiv:2604.22884v1 Announce Type: cross
Abstract: Multimodal Large Language Models (MLLMs) have shown promising potential in diverse understanding tasks, e.g., image and video analysis, math and physics olympiads. However, they remain blank and unexpl…