Kundan Krishna, Joseph Y Cheng, Charles Maalouf, Leon A Gatys

Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment

Kundan Krishna, Joseph Y Cheng, Charles Maalouf, Leon A Gatys / May 4, 2026

arXiv:2506.00166v2 Announce Type: replace-cross
Abstract: Existing paradigms for ensuring AI safety, such as guardrail models and alignment training, often compromise either inference efficiency or development flexibility. We introduce Disentangled Sa…

Author name: Kundan Krishna, Joseph Y Cheng, Charles Maalouf, Leon A Gatys

Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment