Selector-Guided Autonomous Curriculum for One-Shot Reinforcement Learning from Verifiable Rewards
arXiv:2605.01823v1 Announce Type: cross
Abstract: Recently, Reinforcement Learning from Verifiable Rewards (RLVR) has been established as a highly effective technique for augmenting the math reasoning skills of Large Language Models (LLMs) based on a …