cs.CL, cs.CR, cs.LG

Sockpuppetting: Jailbreaking LLMs by Combining Prefilling with Optimization

arXiv:2601.13359v2 Announce Type: replace-cross
Abstract: Prefill attacks are an effective and low-cost jailbreaking method, as they directly insert an acceptance sequence (e.g., “Sure, here is how to…”) at the start of an LLM’s output and lead the …