Alaa Asfour, Christopher Indris, Leihan Chen, Tejas Vyas, Guanghui Wang

Distilling 3D Spatial Reasoning into a Lightweight Vision-Language Model with CoT

Alaa Asfour, Christopher Indris, Leihan Chen, Tejas Vyas, Guanghui Wang / May 12, 2026

arXiv:2605.09719v1 Announce Type: cross
Abstract: Large-scale 3D vision-language models (VLMs) like LLaVA-3D offer strong spatial reasoning but are difficult to deploy due to high computational costs. We propose a knowledge distillation framework that…

Author name: Alaa Asfour, Christopher Indris, Leihan Chen, Tejas Vyas, Guanghui Wang

Distilling 3D Spatial Reasoning into a Lightweight Vision-Language Model with CoT