From Priors to Perception: Grounding Video-LLMs in Physical Reality
arXiv:2605.04515v1 Announce Type: new
Abstract: While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited gen…