Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models
arXiv:2512.10362v2 Announce Type: replace-cross
Abstract: Multimodal Large Language Models (MLLMs) demonstrate impressive reasoning capabilities, but often fail to perceive fine-grained visual details, limiting their applicability in precision-demandi…