cs.LG

Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima

arXiv:2604.09258v1 Announce Type: new
Abstract: Pretraining is the cornerstone of Large Language Models (LLMs), dominating the vast majority of computational budget and data to serve as the primary engine for their capabilities. During pretraining, LL…