cs.CL, math.DG

Revisiting Anisotropy in Language Transformers: The Geometry of Learning Dynamics

arXiv:2604.08764v1 Announce Type: new
Abstract: Since their introduction, Transformer architectures have dominated Natural Language Processing (NLP). However, recent research has highlighted an inherent anisotropy phenomenon in these models, presentin…