cs.CL

Heterogeneous Adaptive Policy Optimization: Tailoring Optimization to Every Token’s Nature

arXiv:2509.16591v2 Announce Type: replace
Abstract: Using entropy as a measure of heterogeneity to guide optimization has emerged as a crucial research direction in Reinforcement Learning for LLMs. However, existing methods typically treat it as a dis…