Beyond Linearity in Attention Projections: The Case for Nonlinear Queries
arXiv:2603.13381v2 Announce Type: replace-cross
Abstract: Recent algebraic analysis shows that in decoder-only and encoder-only transformers, the Query projection $W_Q$ may be set to identity without noticeable performance deterioration. This is possi…