cs.AI

Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models

arXiv:2511.00710v4 Announce Type: replace
Abstract: Recent studies posit that Reinforcement Learning with Verifiable Rewards (RLVR) primarily amplifies behaviors inherent to the pre-training distribution rather than inducing new capabilities, but thes…