Implementing surrogate goals for safer bargaining in LLM-based agents
arXiv:2604.04341v1 Announce Type: new
Abstract: Surrogate goals have been proposed as a strategy for reducing risks from bargaining failures. A surrogate goal is goal that a principal can give an AI agent and that deflects any threats against the agen…