cs.AI, cs.CL, cs.LG

Context Over Content: Exposing Evaluation Faking in Automated Judges

arXiv:2604.15224v1 Announce Type: cross
Abstract: The $\textit{LLM-as-a-judge}$ paradigm has become the operational backbone of automated AI evaluation pipelines, yet rests on an unverified assumption: that judges evaluate text strictly on its semanti…