The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
arXiv:2602.14872v2 Announce Type: replace-cross
Abstract: Reinforcement learning with verifiable rewards (RLVR) has been a main driver of recent breakthroughs in large reasoning models. Yet it remains a mystery how rewards based solely on final outcom…