Rethinking RL for LLM Reasoning: It’s Sparse Policy Selection, Not Capability Learning
arXiv:2605.06241v1 Announce Type: new
Abstract: Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mas…