Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI
arXiv:2507.05660v2 Announce Type: replace-cross
Abstract: Customizing Large Language Models (LLMs) on untrusted datasets poses severe risks of injecting toxic behaviors. In this work, we introduce Optimus, a novel defense framework designed to mitigat…