Elastic Attention Cores for Scalable Vision Transformers [R]
Wanted to share our latest paper on an alternative building block for Vision Transformers. Illustration of our model's accuracy and dense features Traditional ViTs utilize dense (N2) self-attention, which can become pretty costly at higher re…