Large Spikes in Stochastic Gradient Descent: A Large-Deviations View
arXiv:2603.10079v2 Announce Type: replace
Abstract: Large loss spikes in stochastic gradient descent are studied through a rigorous large-deviations analysis for a shallow, fully connected network in the NTK scaling. In contrast to full-batch gradient…