Jixuan Leng, Si Si, Hsiang-Fu Yu, Vinod Raman, Inderjit S. Dhillon

GroupDPO: Memory efficient Group-wise Direct Preference Optimization

Jixuan Leng, Si Si, Hsiang-Fu Yu, Vinod Raman, Inderjit S. Dhillon / April 20, 2026

arXiv:2604.15602v1 Announce Type: new
Abstract: Preference optimization is widely used to align Large Language Models (LLMs) with preference feedback. However, most existing methods train on a single positive-negative pair per prompt, discarding addit…

Author name: Jixuan Leng, Si Si, Hsiang-Fu Yu, Vinod Raman, Inderjit S. Dhillon

GroupDPO: Memory efficient Group-wise Direct Preference Optimization