cs.CL, cs.CV, cs.LG, stat.ML

VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation

arXiv:2604.25235v1 Announce Type: cross
Abstract: Vision-language models (VLMs) are increasingly used as automated judges for multimodal systems, yet their scores provide no indication of reliability. We study this problem through conformal prediction…