Charlie Griffin, Louis Thomson, Buck Shlegeris, Alessandro Abate

Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols

Charlie Griffin, Louis Thomson, Buck Shlegeris, Alessandro Abate / May 8, 2026

arXiv:2409.07985v2 Announce Type: replace
Abstract: To evaluate the safety and usefulness of deployment protocols for untrusted AIs, AI Control uses a red-teaming exercise played between a protocol designer and an adversary. This paper introduces AI-C…

Author name: Charlie Griffin, Louis Thomson, Buck Shlegeris, Alessandro Abate

Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols