Su Ho Han, Jeongseok Hyun, Pilhyeon Lee, Minho Shim, Dongyoon Wee, Seon Joo Kim

Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation

Su Ho Han, Jeongseok Hyun, Pilhyeon Lee, Minho Shim, Dongyoon Wee, Seon Joo Kim / April 27, 2026

arXiv:2510.19592v2 Announce Type: replace
Abstract: Multimodal large language models (MLLMs) demonstrate strong video understanding by attending to visual tokens relevant to textual queries. To directly adapt this for localization in a training-free m…

Author name: Su Ho Han, Jeongseok Hyun, Pilhyeon Lee, Minho Shim, Dongyoon Wee, Seon Joo Kim

Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation