cs.AI, cs.CL, cs.CV

VideoGameBench: Can Vision-Language Models complete popular video games?

arXiv:2505.18134v3 Announce Type: replace
Abstract: Vision-language models (VLMs) have achieved strong results on coding and math benchmarks that are challenging for humans, yet their ability to perform tasks that come naturally to humans–such as per…