cs.AI, cs.LG

Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR

arXiv:2605.05965v1 Announce Type: new
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a key approach for improving the reasoning abilities of large language models. However, widely used critic-free algorithms such as Group R…