cs.CV

MLLM-as-a-Judge Exhibits Model Preference Bias

arXiv:2604.11589v1 Announce Type: new
Abstract: Automatic evaluation using multimodal large language models (MLLMs), commonly referred to as MLLM-as-a-Judge, has been widely used to measure model performance. If such MLLM-as-a-Judge methods were biase…