Dual-objective Language Models: Training Efficiency Without Overfitting
arXiv:2512.14549v3 Announce Type: replace
Abstract: This paper combines autoregressive and masked-diffusion training objectives without any architectural modifications, resulting in flexible language models that outperform single-objective models. Aut…