Ayan Pendharkar - Provide.ai

Gradient Flow Structure and Quantitative Dynamics of Multi-Head Self-Attention

Ayan Pendharkar / May 7, 2026

arXiv:2605.04279v1 Announce Type: new
Abstract: Transformer self-attention can be interpreted as a gradient flow on the unit sphere, in which tokens evolve under softmax interaction potentials and tend to form clusters. While prior work has establishe…

Author name: Ayan Pendharkar

Gradient Flow Structure and Quantitative Dynamics of Multi-Head Self-Attention