Attention Transfer Is Not Universally Effective for Vision Transformers
arXiv:2605.07191v1 Announce Type: new
Abstract: A recent work shows that Attention Transfer, which transfers only the attention patterns from a pre-trained teacher Vision Transformer (ViT) to a randomly initialized standard student ViT, is sufficient …