cs.AI, cs.LG

Anon: Extrapolating Adaptivity Beyond SGD and Adam

arXiv:2605.02317v2 Announce Type: replace-cross
Abstract: Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adapti…

Scroll to Top