Shuyao Shi, Kang G. Shin

Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding

Shuyao Shi, Kang G. Shin / May 8, 2026

arXiv:2603.17980v2 Announce Type: replace
Abstract: Recent Multimodal Large Language Models (MLLMs) have shown high potential for spatial reasoning within 3D scenes. However, they typically rely on computationally expensive 3D representations like poi…

Author name: Shuyao Shi, Kang G. Shin

Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding