cs.LG

Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data

arXiv:2604.18493v1 Announce Type: new
Abstract: Reinforcement Learning (RL) enhances LLM reasoning, yet a paradox emerges as models scale: strong base models saturate standard benchmarks (e.g., MATH), yielding correct but homogeneous solutions. In suc…