ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning
arXiv:2603.28610v1 Announce Type: cross
Abstract: Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatial resolution and long…