cs.AI, cs.CV

Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

arXiv:2604.06912v1 Announce Type: cross
Abstract: MLLMs require high-resolution visual inputs for fine-grained tasks like document understanding and dense scene perception. However, current global resolution scaling paradigms indiscriminately flood th…