Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
arXiv:2507.01679v3 Announce Type: replace-cross
Abstract: Existing LLMs-post-training techniques are broadly categorized into supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT). Each paradigm presents a distinct trade-off: (1) SFT excels…