Song Yu, Li Li - Provide.ai

ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models

Song Yu, Li Li / March 31, 2026

arXiv:2603.28204v1 Announce Type: new
Abstract: Reinforcement learning from verifiable rewards (RLVR) has significantly advanced the reasoning capabilities of large language models. However, standard Group Relative Policy Optimization (GRPO) typically…

Author name: Song Yu, Li Li

ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models