On the Convergence Analysis of Muon
arXiv:2505.23737v2 Announce Type: replace
Abstract: The majority of parameters in neural networks are naturally represented as matrices. However, most commonly used optimizers treat these matrix parameters as flattened vectors during optimization, pot…