cs.CV, cs.LG

Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers

arXiv:2605.06169v1 Announce Type: cross
Abstract: Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and …