cs.AI, cs.LG

RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning

arXiv:2601.09253v2 Announce Type: replace-cross
Abstract: While Supervised Fine-Tuning (SFT) and Rejection Sampling Fine-Tuning (RFT) are standard for LLM alignment, they either rely on costly expert data or discard valuable negative samples, leading …