Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning
arXiv:2604.10701v1 Announce Type: cross
Abstract: Credit assignment is a central challenge in reinforcement learning (RL). Classical actor-critic methods address this challenge through fine-grained advantage estimation based on a learned value functio…