OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning
arXiv:2508.16198v3 Announce Type: replace
Abstract: Multimodal Large Language Models (MLLMs) have increasingly supported omni-modal processing across text, vision, and speech. However, existing evaluation frameworks for such models suffer from critica…