cs.CL, cs.CR

One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety

arXiv:2604.25921v1 Announce Type: new
Abstract: Large Language Models (LLMs) are trained to refuse harmful requests, yet they remain vulnerable to jailbreak attacks that exploit weaknesses in conversational safety mechanisms. We introduce Incremental …