Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs
arXiv:2604.15780v1 Announce Type: new
Abstract: Machine learning models are increasingly deployed in real-world applications, but even aligned models such as Mistral and LLaVA still exhibit unsafe behaviors inherited from pre-training. Current alignme…