cs.IT, cs.LG, math.IT, math.OC, stat.ML

On the Convergence Analysis of Muon

arXiv:2505.23737v2 Announce Type: replace
Abstract: The majority of parameters in neural networks are naturally represented as matrices. However, most commonly used optimizers treat these matrix parameters as flattened vectors during optimization, pot…