cs.AI, cs.CR, cs.LG

Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment

arXiv:2405.13068v4 Announce Type: replace-cross
Abstract: Large language models (LLMs) have revolutionized various applications, making robust safety alignment essential to prevent harmful outputs. Current safety alignment techniques, however, harbor …