Quantitative Clustering in Mean-Field Transformer Models
arXiv:2504.14697v3 Announce Type: replace-cross
Abstract: The evolution of tokens through deep transformer models can be modeled as an interacting particle system that has been shown to exhibit an asymptotic clustering behavior akin to the synchroniza…