Continual Safety Alignment via Gradient-Based Sample Selection
arXiv:2604.17215v1 Announce Type: new
Abstract: Large language models require continuous adaptation to new tasks while preserving safety alignment. However, fine-tuning on even benign data often compromises safety behaviors, including refusal of harmf…