Fair and Calibrated Toxicity Detection with Robust Training and Abstention
arXiv:2605.14074v1 Announce Type: new
Abstract: Fairness in toxicity classification involves three integrated axes: ranking, calibration, and abstention. Training-time interventions and post-hoc safety mechanisms cannot be evaluated independently beca…