cs.AI, cs.CL, cs.CR

Activation-Guided Local Editing for Jailbreaking Attacks

arXiv:2508.00555v2 Announce Type: replace-cross
Abstract: Jailbreaking is an essential adversarial technique for red-teaming these models to uncover and patch security flaws. However, existing jailbreak methods face significant drawbacks. Token-level …