cs.CL

Please refuse to answer me! Mitigating Over-Refusal in Large Language Models via Adaptive Contrastive Decoding

arXiv:2604.17132v1 Announce Type: new
Abstract: Safety-aligned large language models (LLMs) often generate refusal responses to harmless queries due to the over-refusal problem. However, existing methods for mitigating over-refusal cannot maintain a l…