Almog Hilel, Riddhi Bhagwat, Idan Shenfeld, Jacob Andreas, Leshem Choshen

LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users

Almog Hilel, Riddhi Bhagwat, Idan Shenfeld, Jacob Andreas, Leshem Choshen / April 21, 2026

arXiv:2507.02850v3 Announce Type: replace
Abstract: We describe a vulnerability in language models (LMs) trained with user feedback, whereby a single user can persistently alter LM knowledge and behavior given only the ability to provide prompts and u…

Author name: Almog Hilel, Riddhi Bhagwat, Idan Shenfeld, Jacob Andreas, Leshem Choshen

LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users