Afrozah Nadeem, Mark Dras, Usman Naseem

Fairness Evaluation and Inference Level Mitigation in LLMs

Afrozah Nadeem, Mark Dras, Usman Naseem / April 8, 2026

arXiv:2510.18914v3 Announce Type: replace-cross
Abstract: Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propag…

Author name: Afrozah Nadeem, Mark Dras, Usman Naseem

Fairness Evaluation and Inference Level Mitigation in LLMs