Proximal Supervised Fine-Tuning
arXiv:2508.17784v2 Announce Type: replace-cross
Abstract: Supervised fine-tuning (SFT) of foundation models often leads to poor generalization, where prior capabilities deteriorate after tuning on new tasks or domains. Inspired by trust-region policy …