cs.LG, stat.ML

Gating Enables Curvature: A Geometric Expressivity Gap in Attention

arXiv:2604.14702v1 Announce Type: cross
Abstract: Multiplicative gating is widely used in neural architectures and has recently been applied to attention layers to improve performance and training stability in large language models. Despite the succes…