Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails
arXiv:2603.03099v5 Announce Type: replace
Abstract: Despite Adam demonstrating faster empirical convergence than SGD in many applications, much of the existing theory yields guarantees essentially comparable to those of SGD, leaving the empirical perf…