Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration
arXiv:2604.02869v1 Announce Type: new
Abstract: Training tool-calling agents with reinforcement learning on multi-turn tasks remains challenging due to sparse outcome rewards and difficult credit assignment across conversation turns. We present the fi…