cs.AI, cs.LG

Not Just RLHF: Why Alignment Alone Won’t Fix Multi-Agent Sycophancy

arXiv:2605.12991v1 Announce Type: new
Abstract: LLM-based multi-agent pipelines flip from correct to incorrect answers under simulated peer disagreement at rates we term yield, a vulnerability widely attributed to RLHF-induced sycophancy. We test this…