Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models
arXiv:2605.08472v1 Announce Type: new
Abstract: The effectiveness of Reinforcement Learning (RL) in Large Language Models (LLMs) depends on the nature and diversity of the data used before and during RL. In particular, reasoning problems can often be …