Zicheng Zhao, Chaofan Gan, Shijie Li, Weiyao Lin

From Priors to Perception: Grounding Video-LLMs in Physical Reality

Zicheng Zhao, Chaofan Gan, Shijie Li, Weiyao Lin / May 7, 2026

arXiv:2605.04515v1 Announce Type: new
Abstract: While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited gen…

Author name: Zicheng Zhao, Chaofan Gan, Shijie Li, Weiyao Lin

From Priors to Perception: Grounding Video-LLMs in Physical Reality