cs.LG

Muown: Row-Norm Control for Muon Optimization

arXiv:2605.10797v1 Announce Type: new
Abstract: Muon has emerged as a strong competitor to AdamW for language model pre-training, yet its behavior at scale is sensitive to weight decay. Recent work has observed that, for Muon without decoupled weight …