Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models
arXiv:2604.26898v1 Announce Type: cross
Abstract: We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to a continuous-time stochastic interacting …