Improving LLM Unlearning Robustness via Random Perturbations
arXiv:2501.19202v5 Announce Type: replace
Abstract: Here, we show that current LLM unlearning methods inherently reduce models’ robustness, causing them to misbehave even when a single non-adversarial forget-token is present in the retain-query. Towar…