cs.CV

On the Nature of Attention Sink that Shapes Decoding Strategy in Omni-LLMs

arXiv:2603.14337v2 Announce Type: replace
Abstract: The goal of this paper is to strengthen the reasoning of Omnimodal Large Language Models (Omni-LLMs) at inference time, without additional training. These models jointly process video, audio, and tex…