R\’enyi Attention Entropy for Patch Pruning
arXiv:2604.03803v1 Announce Type: new
Abstract: Transformers are strong baselines in both vision and language because self-attention captures long-range dependencies across tokens. However, the cost of self-attention grows quadratically with the numbe…