Sockpuppetting: Jailbreaking LLMs by Combining Prefilling with Optimization
arXiv:2601.13359v2 Announce Type: replace-cross
Abstract: Prefill attacks are an effective and low-cost jailbreaking method, as they directly insert an acceptance sequence (e.g., “Sure, here is how to…”) at the start of an LLM’s output and lead the …