Environmental Understanding Vision-Language Model for Embodied Agent
arXiv:2604.19839v1 Announce Type: new
Abstract: Vision-language models (VLMs) have shown strong perception and reasoning abilities for instruction-following embodied agents. However, despite these abilities and their generalization performance, they s…