cs.AI, cs.LG, math.OC

Budget-aware Auto Optimizer Configurator

arXiv:2605.04711v1 Announce Type: cross
Abstract: Optimizer states occupy massive GPU memory in large-scale model training. However, gradients in different network blocks exhibit distinct behaviors, such as varying directional stability and scale anis…