On The Application of Linear Attention in Multimodal Transformers
arXiv:2604.10064v1 Announce Type: new
Abstract: Multimodal Transformers serve as the backbone for state-of-the-art vision-language models, yet their quadratic attention complexity remains a critical barrier to scalability. In this work, we investigate…