Author name: Charles Ye, Jasmine Cui, Dylan Hadfield-Menell

Prompt Injection as Role Confusion

Charles Ye, Jasmine Cui, Dylan Hadfield-Menell / April 17, 2026

arXiv:2603.12277v4 Announce Type: replace
Abstract: Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models infer the source of text based on how it sounds, not w…

cs.AI, cs.CL, cs.CR

Prompt Injection as Role Confusion

Charles Ye, Jasmine Cui, Dylan Hadfield-Menell / April 14, 2026

arXiv:2603.12277v3 Announce Type: replace
Abstract: Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models infer the source of text based on how it sounds, not w…