Demystifying Manifold Constraints in LLM Pre-training
arXiv:2605.04418v1 Announce Type: new
Abstract: The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained opt…