cs.AI, cs.CL

Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models

arXiv:2601.04731v2 Announce Type: replace-cross
Abstract: Current critic-free RL methods for large reasoning models suffer from severe inefficiency when training on positive homogeneous prompts (where all rollouts are correct), resulting in waste of r…