Activation-Guided Local Editing for Jailbreaking Attacks
arXiv:2508.00555v2 Announce Type: replace-cross
Abstract: Jailbreaking is an essential adversarial technique for red-teaming these models to uncover and patch security flaws. However, existing jailbreak methods face significant drawbacks. Token-level …