Debarshi Kundu, Archisman Ghosh, Swaroop Ghosh, Vasant Honavar

Projection-Free Transformers via Gaussian Kernel Attention

Debarshi Kundu, Archisman Ghosh, Swaroop Ghosh, Vasant Honavar / May 5, 2026

arXiv:2605.02144v1 Announce Type: new
Abstract: Self-attention in Transformers is typically implemented as $\mathrm{softmax}(QK^\top/\sqrt{d})V$, where $Q=XW_Q$, $K=XW_K$, and $V=XW_V$ are learned linear projections of the input $X$. We ask whether th…

Author name: Debarshi Kundu, Archisman Ghosh, Swaroop Ghosh, Vasant Honavar

Projection-Free Transformers via Gaussian Kernel Attention