Revealing the Learning Dynamics of Long-Context Continual Pre-training
arXiv:2604.02650v1 Announce Type: new
Abstract: Existing studies on Long-Context Continual Pre-training (LCCP) mainly focus on small-scale models and limited data regimes (tens of billions of tokens). We argue that directly migrating these small-scale…