M2-Verify: A Large-Scale Multidomain Benchmark for Checking Multimodal Claim Consistency
arXiv:2604.01306v1 Announce Type: new
Abstract: Evaluating scientific arguments requires assessing the strict consistency between a claim and its underlying multimodal evidence. However, existing benchmarks lack the scale, domain diversity, and visual…