James Fiedler - Provide.ai

Bias and Uncertainty in LLM-as-a-Judge Estimation

James Fiedler / May 11, 2026

arXiv:2605.06939v1 Announce Type: new
Abstract: LLM-as-a-Judge evaluation has become a standard tool for assessing base model performance. However, characterizing performance via the naive estimator, i.e., raw judge outputs, is systematically biased. …

Author name: James Fiedler

Bias and Uncertainty in LLM-as-a-Judge Estimation