Mind the Gap: Structure-Aware Consistency in Preference Learning
arXiv:2604.27733v1 Announce Type: new
Abstract: Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrogate losses as proxi…