Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward
arXiv:2605.09920v1 Announce Type: cross
Abstract: While Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a promising post-training paradigm for Large Language Models (LLMs), its dependency on the gold label or domain-speci…