The Origin of Edge of Stability
arXiv:2604.20446v1 Announce Type: new
Abstract: Full-batch gradient descent on neural networks drives the largest Hessian eigenvalue to the threshold $2/\eta$, where $\eta$ is the learning rate. This phenomenon, the Edge of Stability, has resisted a u…