Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training
arXiv:2603.28921v2 Announce Type: replace-cross
Abstract: Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its
optimality. We derive a time-varying moment…