Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling
arXiv:2604.08178v2 Announce Type: replace
Abstract: In classical Reinforcement Learning from Human Feedback (RLHF), Reward Models (RMs) serve as the fundamental signal provider for model alignment. As Large Language Models evolve into agentic systems …