cs.AI, cs.CL, cs.LG

Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification

arXiv:2601.21244v3 Announce Type: replace-cross
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has advanced LLM reasoning, but remains constrained by inefficient exploration under limited rollout budgets, leading to low sampling succe…