RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models
arXiv:2602.04448v2 Announce Type: replace-cross
Abstract: Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable degenerate optimization behaviors under standa…