Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models

arXiv:2604.11609v2 Announce Type: replace Abstract: Large language models exhibit sycophantic tendencies, but whether this behavior varies systematically with perceived user demographics is underexplored. Inspired by intersectionality (overlapping identities produce compounded effects), we probe whether frontier models conditionally exhibit sycophancy. Across 768 multi-turn conversations spanning 128 personas (varying race, age, gender, confidence) and three domains (mathematics, philosophy, conspiracy theories), we find that sycophancy varies sharply with target model and domain, and emerges from combinations of perceived user traits rather than any single dimension. GPT-5-nano scores far higher than Claude Haiku 4.5 (average sycophancy scores of $\bar{x}=2.96$ vs.\ $1.74$, $p < 10^{-32}$); within GPT-5-nano, philosophy elicits 41\% more sycophancy than mathematics and Hispanic personas receive the highest scores across races. The worst-scoring persona, a confident, 23-year-old Hispanic woman, averages 5.33/10 (max 6/10), while Claude Haiku 4.5 remains uniformly low with no significant demographic variation. We argue that safety evaluations should incorporate identity-aware adversarial testing.

Leave a Comment