Thong Bach, Dung Nguyen, Thao Minh Le, Truyen Tran

Continual Safety Alignment via Gradient-Based Sample Selection

Thong Bach, Dung Nguyen, Thao Minh Le, Truyen Tran / April 21, 2026

arXiv:2604.17215v1 Announce Type: new
Abstract: Large language models require continuous adaptation to new tasks while preserving safety alignment. However, fine-tuning on even benign data often compromises safety behaviors, including refusal of harmf…

Author name: Thong Bach, Dung Nguyen, Thao Minh Le, Truyen Tran

Continual Safety Alignment via Gradient-Based Sample Selection