cs.AI, cs.CR, cs.LG

RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models

arXiv:2602.04448v2 Announce Type: replace-cross
Abstract: Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable degenerate optimization behaviors under standa…