Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs
arXiv:2512.08923v2 Announce Type: replace
Abstract: We introduce two new benchmarks REST and REST+ (Render-Equivalence Stress Tests) to enable systematic evaluation of cross-modal inconsistency in multimodal large language models (MLLMs). MLLMs are tr…