Exploring Reasoning Reward Model for Agents
arXiv:2601.22154v2 Announce Type: replace-cross
Abstract: Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still relies on sparse outcome-based…