cs.AI, cs.CL

Fairness Evaluation and Inference Level Mitigation in LLMs

arXiv:2510.18914v3 Announce Type: replace-cross
Abstract: Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propag…