On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
arXiv:2601.07389v2 Announce Type: replace
Abstract: Post-training of large language models routinely interleaves supervised fine-tuning (SFT) with reinforcement learning (RL). These two methods have different objectives: SFT minimizes the cross-entrop…