Manav Pandey - Provide.ai

LLMs Know They’re Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit

Manav Pandey / April 22, 2026

arXiv:2604.19117v1 Announce Type: new
Abstract: When a language model agrees with a user’s false belief, is it failing to detect the error, or noticing and agreeing anyway? We show the latter. Across twelve open-weight models from five labs, spanning …

Author name: Manav Pandey

LLMs Know They’re Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit