cs.CL, cs.LG

Scaling Laws for Mixture Pretraining Under Data Constraints

arXiv:2605.12715v1 Announce Type: new
Abstract: As language models scale, the amount of data they require grows — yet many target data sources, such as low-resource languages or specialized domains, are inherently limited in size. A common strategy i…