cs.AI, cs.CL, cs.CV

ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning

arXiv:2603.28610v1 Announce Type: cross
Abstract: Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatial resolution and long…