Robust Optimization for Mitigating Reward Hacking with Correlated Proxies
arXiv:2604.12086v1 Announce Type: new
Abstract: Designing robust reinforcement learning (RL) agents in the presence of imperfect reward signals remains a core challenge. In practice, agents are often trained with proxy rewards that only approximate th…