Hugo Koubbi, Borjan Geshkovski, Philippe Rigollet

Homogenized Transformers

Hugo Koubbi, Borjan Geshkovski, Philippe Rigollet / April 3, 2026

arXiv:2604.01978v1 Announce Type: cross
Abstract: We study a random model of deep multi-head self-attention in which the weights are resampled independently across layers and heads, as at initialization of training. Viewing depth as a time variable, t…

Author name: Hugo Koubbi, Borjan Geshkovski, Philippe Rigollet

Homogenized Transformers