Yuchong Zhang, Vardan Papyan

Laplacian Heads Improve Transformers by Smoothing Token Representations

Yuchong Zhang, Vardan Papyan / May 12, 2026

arXiv:2602.09297v2 Announce Type: replace
Abstract: Transformers update token representations through multi-head attention and residual connections as $X \leftarrow X + \sum_{i} P^{(i)}XW_{V_i}W_{o_i}$, where $P^{(i)}$ is the softmax attention matrix …

Author name: Yuchong Zhang, Vardan Papyan

Laplacian Heads Improve Transformers by Smoothing Token Representations