cs.LG, stat.ML

Stochastic Gradient Descent in the Saddle-to-Saddle Regime of Deep Linear Networks

arXiv:2604.06366v1 Announce Type: cross
Abstract: Deep linear networks (DLNs) are used as an analytically tractable model of the training dynamics of deep neural networks. While gradient descent in DLNs is known to exhibit saddle-to-saddle dynamics, t…