Evaluation of Prompt Injection Defenses in Large Language Models
arXiv:2604.23887v1 Announce Type: cross
Abstract: LLM-powered applications routinely embed secrets in system prompts, yet models can be tricked into revealing them. We built an adaptive attacker that evolves its strategies over hundreds of rounds and …