Context Over Content: Exposing Evaluation Faking in Automated Judges
arXiv:2604.15224v1 Announce Type: cross
Abstract: The $\textit{LLM-as-a-judge}$ paradigm has become the operational backbone of automated AI evaluation pipelines, yet rests on an unverified assumption: that judges evaluate text strictly on its semanti…