cs.CL, cs.CR, cs.LG

LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users

arXiv:2507.02850v3 Announce Type: replace
Abstract: We describe a vulnerability in language models (LMs) trained with user feedback, whereby a single user can persistently alter LM knowledge and behavior given only the ability to provide prompts and u…