cs.CL

DART: Mitigating Harm Drift in Difference-Aware LLMs via Distill-Audit-Repair Training

arXiv:2604.16845v1 Announce Type: new
Abstract: Large language models (LLMs) tuned for safety often avoid acknowledging demographic differences, even when such acknowledgment is factually correct (e.g., ancestry-based disease incidence) or contextuall…