Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization
arXiv:2603.08022v2 Announce Type: replace
Abstract: A data mixture refers to how different data sources are combined to train large language models, and selecting an effective mixture is crucial for optimal downstream performance. Existing methods eit…