Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment
arXiv:2405.13068v4 Announce Type: replace-cross
Abstract: Large language models (LLMs) have revolutionized various applications, making robust safety alignment essential to prevent harmful outputs. Current safety alignment techniques, however, harbor …