cs.AI, cs.LG

Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning

arXiv:2605.08765v1 Announce Type: cross
Abstract: Unlearning in large language models (LLMs) aims to remove harmful training data while preserving overall utility. However, we find that existing methods often hallucinate, generate abnormal token seque…