Royce Moon, Lav R. Varshney

Containment Verification: AI Safety Guarantees Independent of Alignment

Royce Moon, Lav R. Varshney / May 12, 2026

arXiv:2605.09045v1 Announce Type: new
Abstract: Agentic frameworks are the software layer through which AI agents act in the world. Existing safety methods intervene on the model and therefore remain conditional on unverifiable properties of learned b…

Author name: Royce Moon, Lav R. Varshney

Containment Verification: AI Safety Guarantees Independent of Alignment