Yein Park, Jungwoo Park, Jaewoo Kang

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

Yein Park, Jungwoo Park, Jaewoo Kang / April 15, 2026

arXiv:2509.25843v2 Announce Type: replace
Abstract: Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be circumvented by simple linguistic changes. As tense jailbreaking demonstrates that models ref…

Author name: Yein Park, Jungwoo Park, Jaewoo Kang

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack