Yanran Li - Provide.ai

Calibrate, Don’t Curate: Label-Efficient Estimation from Noisy LLM Judges

Yanran Li / May 12, 2026

arXiv:2605.09702v1 Announce Type: cross
Abstract: Multi-judge evaluation is increasingly used to assess LLMs and reward models, and the prevailing heuristic is to curate: keep the most accurate judges and discard weaker ones. We show that this heurist…

Author name: Yanran Li

Calibrate, Don’t Curate: Label-Efficient Estimation from Noisy LLM Judges