cs.CL

DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization

arXiv:2605.10863v1 Announce Type: new
Abstract: Although Large Language Models (LLMs) have made remarkable progress, current preference optimization methods still struggle to align directional consistency while preserving reasoning diversity. To addre…