\"Omer Faruk Akg\"ul, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna

Rethinking RL for LLM Reasoning: It’s Sparse Policy Selection, Not Capability Learning

\"Omer Faruk Akg\"ul, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna / May 8, 2026

arXiv:2605.06241v1 Announce Type: new
Abstract: Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mas…

Author name: \"Omer Faruk Akg\"ul, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna

Rethinking RL for LLM Reasoning: It’s Sparse Policy Selection, Not Capability Learning