When Sinks Help or Hurt: Unified Framework for Attention Sink in Large Vision-Language Models
arXiv:2604.03316v1 Announce Type: new
Abstract: Attention sinks are defined as tokens that attract disproportionate attention. While these have been studied in single modality transformers, their cross-modal impact in Large Vision-Language Models (LVL…