Author name: Haiyue Song, Masao Utiyama

OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

Haiyue Song, Masao Utiyama / April 7, 2026

arXiv:2603.28858v2 Announce Type: replace-cross
Abstract: Continual pre-training is widely used to adapt LLMs to target languages and domains, yet the mixture ratio of training data remains a sensitive hyperparameter that is expensive to tune: they mu…

cs.AI, cs.CL, cs.LG

OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

Haiyue Song, Masao Utiyama / April 1, 2026

arXiv:2603.28858v1 Announce Type: new
Abstract: Continual pre-training is widely used to adapt LLMs to target languages and domains, yet the mixture ratio of training data remains a sensitive hyperparameter that is expensive to tune: they must be fixe…