One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety
arXiv:2604.25921v1 Announce Type: new
Abstract: Large Language Models (LLMs) are trained to refuse harmful requests, yet they remain vulnerable to jailbreak attacks that exploit weaknesses in conversational safety mechanisms. We introduce Incremental …