cs.AI

Targeted Exploration via Unified Entropy Control for Reinforcement Learning

arXiv:2604.14646v2 Announce Type: replace
Abstract: Recent advances in reinforcement learning (RL) have improved the reasoning capabilities of large language models (LLMs) and vision-language models (VLMs). However, the widely used Group Relative Poli…