How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors
arXiv:2605.08817v1 Announce Type: new
Abstract: Reinforcement learning with verifiable rewards (RLVR) recently thrives in large language model (LLM) reasoning tasks. However, the reward sparsity and the long reasoning horizon make effective exploratio…