Curriculum Learning for LLM Pretraining: An Analysis of Learning Dynamics
arXiv:2601.21698v2 Announce Type: replace-cross
Abstract: Curriculum learning changes the order of pretraining data, but it remains unclear how ordering changes the learning dynamics. We pretrain models from 14M to 1B parameters for 300B tokens under …