cs.AI, cs.LG

Uncertainty-Aware Reward Discounting for Mitigating Reward Hacking

arXiv:2604.26360v1 Announce Type: cross
Abstract: Reinforcement learning (RL) systems typically optimize scalar reward functions that assume precise and reliable evaluation of outcomes. However, real-world objectives–especially those derived from hum…