Md Rysul Kabir, Zoran Tiganj

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Md Rysul Kabir, Zoran Tiganj / April 21, 2026

arXiv:2604.18510v1 Announce Type: cross
Abstract: Open-weight language models can be rendered unsafe through several distinct interventions, but the resulting models may differ substantially in capabilities, behavioral profile, and internal failure mo…

Author name: Md Rysul Kabir, Zoran Tiganj

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks