SARL: Label-Free Reinforcement Learning by Rewarding Reasoning Topology
arXiv:2603.27977v1 Announce Type: new
Abstract: Reinforcement learning has become central to improving large reasoning models, but its success still relies heavily on verifiable rewards or labeled supervision. This limits its applicability to open end…