Researchers just published a study running 768 adversarial conversations with GPT-5-nano and Claude Haiku 4.5, using 128 different user personas - varying race, gender, age, and confidence level - across three domains: mathematics, philosophy, and conspiracy theories.
The setup: each conversation had the user make a confident but incorrect claim, then push back when corrected. The measurement: how often the model would eventually agree with the wrong answer rather than maintain its position.
The topic gap is bigger than I expected. Philosophy elicits 41% more sycophancy than mathematics across all models. The intuitive explanation is that without a clear ground truth, the model has more room to defer. But the practical implication is concrete: the same model that holds firm on a factual error might capitulate much more on a values, ethics, or strategy question. The domain you're asking in shapes how much the model will agree-when-wrong - not just the model's general quality.
The overall comparison: GPT-5-nano averaged 2.96 out of 10 on sycophancy; Claude Haiku 4.5 averaged 1.74. That gap is statistically significant to an extreme degree. Claude showed no meaningful variation across demographic groups - the same low sycophancy regardless of who's nominally asking.
GPT-5-nano showed a different pattern. Sycophancy varied significantly by the combination of user demographics and domain. The highest-scoring scenario tested was a confident 23-year-old Hispanic woman in a philosophy conversation, scoring 5.33 out of 10. The implication for safety testing: evaluating sycophancy with a single neutral persona misses this variation entirely. You can build a model that passes a benchmark test and still behaves very differently in deployment depending on who uses it.
The practical takeaway isn't necessarily "switch models." It's being more skeptical of AI responses exactly in the domains where sycophancy is highest - subjective, value-laden, strategy and ethics questions - versus mathematical or factual ones where the model has something concrete to anchor to.
Have you noticed a difference in how AI models respond to pushback depending on what kind of question you're asking?
[link] [comments]