Adarsh Kumarappan, Ananya Mujoo

Not Just RLHF: Why Alignment Alone Won’t Fix Multi-Agent Sycophancy

Adarsh Kumarappan, Ananya Mujoo / May 14, 2026

arXiv:2605.12991v1 Announce Type: new
Abstract: LLM-based multi-agent pipelines flip from correct to incorrect answers under simulated peer disagreement at rates we term yield, a vulnerability widely attributed to RLHF-induced sycophancy. We test this…

Author name: Adarsh Kumarappan, Ananya Mujoo

Not Just RLHF: Why Alignment Alone Won’t Fix Multi-Agent Sycophancy