Author name: Jiang Zhang, Shijie Zhou, Bangya Liu, Achuta Kadambi, Zhiwen Fan

SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning

Jiang Zhang, Shijie Zhou, Bangya Liu, Achuta Kadambi, Zhiwen Fan / April 21, 2026

arXiv:2603.27437v2 Announce Type: replace
Abstract: Large vision-language models (VLMs) still struggle with reliable 3D spatial reasoning, a core capability for embodied and physical AI systems. This limitation arises from their inability to capture f…

cs.CV

SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning

Jiang Zhang, Shijie Zhou, Bangya Liu, Achuta Kadambi, Zhiwen Fan / March 31, 2026

arXiv:2603.27437v1 Announce Type: new
Abstract: Large vision-language models (VLMs) still struggle with reliable 3D spatial reasoning, a core capability for embodied and physical AI systems. This limitation arises from their inability to capture fine-…