GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA
arXiv:2510.23868v4 Announce Type: replace
Abstract: This paper proposes \textit{Group-relative Implicit Fine-Tuning (GIFT)}, a reinforcement learning framework for aligning large language models (LLMs) that unifies on-policy optimization with implicit…