cs.AI, cs.LG, math.OC

Demystifying Manifold Constraints in LLM Pre-training

arXiv:2605.04418v1 Announce Type: new
Abstract: The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained opt…