StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning
arXiv:2604.18401v1 Announce Type: new
Abstract: General agents have given rise to phenomenal applications such as OpenClaw and Claude Code. As these agent systems (a.k.a. Harnesses) strive for bolder goals, they demand increasingly stronger agentic ca…