Ishaan Watts, Catherine Li, Sachin Goyal, Jacob Mitchell Springer, Aditi Raghunathan

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

Ishaan Watts, Catherine Li, Sachin Goyal, Jacob Mitchell Springer, Aditi Raghunathan / May 5, 2026

arXiv:2605.02105v1 Announce Type: new
Abstract: Pretraining optimizers are tuned to produce the strongest possible base model, on the assumption that a stronger starting point yields a stronger model after subsequent changes like post-training and qua…

Author name: Ishaan Watts, Catherine Li, Sachin Goyal, Jacob Mitchell Springer, Aditi Raghunathan

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting