Hengrui Gu, Xiaotian Han, Yujing Bian, Kaixiong Zhou

Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation

Hengrui Gu, Xiaotian Han, Yujing Bian, Kaixiong Zhou / April 7, 2026

arXiv:2604.04894v1 Announce Type: new
Abstract: Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning capabilities of large language models (LLMs). However, it faces a fundamental limitation termed \textit{rest…

Author name: Hengrui Gu, Xiaotian Han, Yujing Bian, Kaixiong Zhou

Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation