Gradient-Gated DPO: Stabilizing Preference Optimization in Language Models
arXiv:2605.02626v1 Announce Type: new
Abstract: Preference optimization has become a central paradigm for aligning large language models with human feedback. Direct Preference Optimization (DPO) simplifies reinforcement learning from human feedback by…