Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers
arXiv:2605.06169v1 Announce Type: cross
Abstract: Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and …