ARM: Advantage Reward Modeling for Long-Horizon Manipulation
arXiv:2604.03037v2 Announce Type: replace
Abstract: Long-horizon robotic manipulation remains challenging for reinforcement learning (RL) because sparse rewards provide limited guidance for credit assignment. Practical policy improvement thus relies o…