cs.AI, cs.LG

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

arXiv:2605.11134v1 Announce Type: new
Abstract: Preference learning methods such as Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today’s language models and potent…