Aswin RRV, Jacob Dineen, Divij Handa, Mihir Parmar, Ben Zhou, Swaroop Mishra, Chitta Baral

Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models

Aswin RRV, Jacob Dineen, Divij Handa, Mihir Parmar, Ben Zhou, Swaroop Mishra, Chitta Baral / May 12, 2026

arXiv:2605.08472v1 Announce Type: new
Abstract: The effectiveness of Reinforcement Learning (RL) in Large Language Models (LLMs) depends on the nature and diversity of the data used before and during RL. In particular, reasoning problems can often be …

Author name: Aswin RRV, Jacob Dineen, Divij Handa, Mihir Parmar, Ben Zhou, Swaroop Mishra, Chitta Baral

Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models