Revisiting Anisotropy in Language Transformers: The Geometry of Learning Dynamics
arXiv:2604.08764v1 Announce Type: new
Abstract: Since their introduction, Transformer architectures have dominated Natural Language Processing (NLP). However, recent research has highlighted an inherent anisotropy phenomenon in these models, presentin…