cs.AI

Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition

arXiv:2604.05279v1 Announce Type: new
Abstract: Large language models exhibit sycophancy, the tendency to shift their stated positions toward perceived user preferences or authority cues regardless of evidence. Standard alignment methods fail to corre…