Persona Non Grata: Single-Method Safety Evaluation Is Incomplete for Persona-Imbued LLMs
arXiv:2604.11120v2 Announce Type: replace
Abstract: Personality imbuing customizes LLM behavior, but safety evaluations almost always study prompt-based personas alone. We show this is incomplete: prompting and activation steering expose *different*, …