Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Emily Fox

Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition

Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Emily Fox / April 8, 2026

arXiv:2604.05279v1 Announce Type: new
Abstract: Large language models exhibit sycophancy, the tendency to shift their stated positions toward perceived user preferences or authority cues regardless of evidence. Standard alignment methods fail to corre…

Author name: Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Emily Fox

Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition