Golden Handcuffs make safer AI agents
arXiv:2604.13609v1 Announce Type: cross
Abstract: Reinforcement learners can attain high reward through novel unintended strategies. We study a Bayesian mitigation for general environments: we expand the agent’s subjective reward range to include a la…