cs.AI, cs.CL, cs.CV

How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning

arXiv:2603.24866v1 Announce Type: cross
Abstract: The physical world is not merely visual; it is governed by rigorous structural and procedural constraints. Yet, the evaluation of vision-language models (VLMs) remains heavily skewed toward perceptual …