Yifan Liu, Fangneng Zhan, Kaichen Zhou, Yilun Du, Paul Pu Liang, Hanspeter Pfister

Abstract 3D Perception for Spatial Intelligence in Vision-Language Models

Yifan Liu, Fangneng Zhan, Kaichen Zhou, Yilun Du, Paul Pu Liang, Hanspeter Pfister / April 16, 2026

arXiv:2511.10946v3 Announce Type: replace
Abstract: Vision-language models (VLMs) struggle with 3D-related tasks such as spatial cognition and physical understanding, which are crucial for real-world applications like robotics and embodied agents. We …

Author name: Yifan Liu, Fangneng Zhan, Kaichen Zhou, Yilun Du, Paul Pu Liang, Hanspeter Pfister

Abstract 3D Perception for Spatial Intelligence in Vision-Language Models