Jiangye Yuan, Gowri Kumar, Baoyuan Wang

Boosting MLLM Spatial Reasoning with Geometrically Referenced 3D Scene Representations

Jiangye Yuan, Gowri Kumar, Baoyuan Wang / April 28, 2026

arXiv:2603.08592v2 Announce Type: replace
Abstract: While Multimodal Large Language Models (MLLMs) have achieved remarkable success in 2D visual understanding, their ability to reason about 3D space remains limited. To address this gap, we introduce g…

Author name: Jiangye Yuan, Gowri Kumar, Baoyuan Wang

Boosting MLLM Spatial Reasoning with Geometrically Referenced 3D Scene Representations