cs.CL

GroupDPO: Memory efficient Group-wise Direct Preference Optimization

arXiv:2604.15602v1 Announce Type: new
Abstract: Preference optimization is widely used to align Large Language Models (LLMs) with preference feedback. However, most existing methods train on a single positive-negative pair per prompt, discarding addit…