Boosting MLLM Spatial Reasoning with Geometrically Referenced 3D Scene Representations
arXiv:2603.08592v2 Announce Type: replace
Abstract: While Multimodal Large Language Models (MLLMs) have achieved remarkable success in 2D visual understanding, their ability to reason about 3D space remains limited. To address this gap, we introduce g…