cs.AI, cs.LG

Golden Handcuffs make safer AI agents

arXiv:2604.13609v1 Announce Type: cross
Abstract: Reinforcement learners can attain high reward through novel unintended strategies. We study a Bayesian mitigation for general environments: we expand the agent’s subjective reward range to include a la…