Abstract 3D Perception for Spatial Intelligence in Vision-Language Models
arXiv:2511.10946v3 Announce Type: replace
Abstract: Vision-language models (VLMs) struggle with 3D-related tasks such as spatial cognition and physical understanding, which are crucial for real-world applications like robotics and embodied agents. We …