cs.LG

On the Hardness of Junking LLMs

arXiv:2605.05116v1 Announce Type: new
Abstract: Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit semantic structure. These attacks generally operate by…