MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge
arXiv:2604.18164v2 Announce Type: replace-cross
Abstract: Multimodal Large Language Models (MLLMs) have been increasingly used as automatic evaluators-a paradigm known as MLLM-as-a-Judge. However, their reliability and vulnerabilities to biases remain…