cs.CL, cs.IR, cs.LG

VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference

arXiv:2605.11334v1 Announce Type: cross
Abstract: LLM-as-Judge systems are widely deployed for automated evaluation, yet practitioners lack reliable methods to know when a judge’s verdict should be trusted. Token log-probabilities, the standard post-h…