cs.CV, cs.LG

Attention Transfer Is Not Universally Effective for Vision Transformers

arXiv:2605.07191v1 Announce Type: new
Abstract: A recent work shows that Attention Transfer, which transfers only the attention patterns from a pre-trained teacher Vision Transformer (ViT) to a randomly initialized standard student ViT, is sufficient …