SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
arXiv:2603.27437v2 Announce Type: replace
Abstract: Large vision-language models (VLMs) still struggle with reliable 3D spatial reasoning, a core capability for embodied and physical AI systems. This limitation arises from their inability to capture f…