Xueyan Niu, Bo Bai, Wei Han, Weixi Zhang

On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training

Xueyan Niu, Bo Bai, Wei Han, Weixi Zhang / May 7, 2026

arXiv:2601.07389v2 Announce Type: replace
Abstract: Post-training of large language models routinely interleaves supervised fine-tuning (SFT) with reinforcement learning (RL). These two methods have different objectives: SFT minimizes the cross-entrop…

Author name: Xueyan Niu, Bo Bai, Wei Han, Weixi Zhang

On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training