Prompt Injection as Role Confusion
arXiv:2603.12277v4 Announce Type: replace
Abstract: Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models infer the source of text based on how it sounds, not w…