Advancing Vision Transformer with Enhanced Spatial Priors
arXiv:2604.18549v1 Announce Type: new
Abstract: In recent years, the Vision Transformer (ViT) has garnered significant attention within the computer vision community. However, the core component of ViT, Self-Attention, lacks explicit spatial priors an…