Fairness Evaluation and Inference Level Mitigation in LLMs
arXiv:2510.18914v3 Announce Type: replace-cross
Abstract: Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propag…