cond-mat.dis-nn, cs.LG, stat.ML

A Theory of Saddle Escape in Deep Nonlinear Networks

arXiv:2605.01288v1 Announce Type: new
Abstract: In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studie…