Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization
arXiv:2605.06316v1 Announce Type: cross
Abstract: Optimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning — most recently KL-Shampoo, w…