Early Data Exposure Improves Robustness to Subsequent Fine-Tuning
arXiv:2605.12705v1 Announce Type: new
Abstract: How can we train models whose post-trained capabilities survive subsequent fine-tuning? Rather than focusing on downstream interventions to mitigate forgetting of upstream capabilities, we study how upst…