Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs
arXiv:2605.13737v1 Announce Type: new
Abstract: When an omnimodal large language model accepts a question whose textual premise contradicts what it actually sees or hears, does the failure lie in perception or in action? Recent omnimodal models are po…