Intrinsic Mutual Information as a Modulator for Preference Optimization
arXiv:2604.24804v1 Announce Type: cross
Abstract: Offline preference optimization methods, such as Direct Preference Optimization (DPO), offer significant advantages in aligning Large Language Models (LLMs) with human values. However, achieving optima…