cs.LG

MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization

arXiv:2605.11396v1 Announce Type: new
Abstract: The Muon optimizer has emerged as a compelling alternative to Adam for training large language models, achieving remarkable computational savings through gradient orthogonalization. However, Muon’s optim…