cs.AI, cs.CL, cs.CR, cs.LG, math.OC

Secure LLM Fine-Tuning via Safety-Aware Probing

arXiv:2505.16737v2 Announce Type: replace-cross
Abstract: Large language models (LLMs) have achieved remarkable success across many applications, but their ability to generate harmful content raises serious safety concerns. Although safety alignment t…