cs.AI, cs.CL, cs.LG, cs.NA, math.NA, math.OC

PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent

arXiv:2605.10335v1 Announce Type: cross
Abstract: Adaptive optimizers, most notably Adam, have become the default standard for training large-scale neural networks such as Transformers. These methods maintain running estimates of gradient first and se…