cs.AI, cs.CR

Evaluation of Prompt Injection Defenses in Large Language Models

arXiv:2604.23887v1 Announce Type: cross
Abstract: LLM-powered applications routinely embed secrets in system prompts, yet models can be tricked into revealing them. We built an adaptive attacker that evolves its strategies over hundreds of rounds and …