Yao Lu, Dengdong Fan, Shixun Zhang, Yonghong Tian

cs.AI, cs.CL, cs.LG, cs.NA, math.NA, math.OC

PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent

Yao Lu, Dengdong Fan, Shixun Zhang, Yonghong Tian / May 12, 2026

arXiv:2605.10335v1 Announce Type: cross
Abstract: Adaptive optimizers, most notably Adam, have become the default standard for training large-scale neural networks such as Transformers. These methods maintain running estimates of gradient first and se…

Author name: Yao Lu, Dengdong Fan, Shixun Zhang, Yonghong Tian

PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent