Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decoding
arXiv:2506.09522v3 Announce Type: replace-cross
Abstract: Large Vision Language Models (LVLMs) achieve strong performance across multimodal tasks by integrating visual perception with language understanding. However, how vision information contributes…