cs.AI, cs.LG, math.OC, stat.ML

The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards

arXiv:2602.14872v2 Announce Type: replace-cross
Abstract: Reinforcement learning with verifiable rewards (RLVR) has been a main driver of recent breakthroughs in large reasoning models. Yet it remains a mystery how rewards based solely on final outcom…