Anon: Extrapolating Optimizer Adaptivity Across the Real Spectrum
arXiv:2605.02317v1 Announce Type: new
Abstract: Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods…