The Newton-Muon Optimizer
arXiv:2604.01472v1 Announce Type: cross
Abstract: The Muon optimizer has received considerable attention for its strong performance in training large language models, yet the design principle behind its matrix-gradient orthogonalization remains largel…