Jinsik Bang, Jaeyeon Bae, Donggyu Lee, Siyeol Jung, Taehwan Kim

Environmental Understanding Vision-Language Model for Embodied Agent

Jinsik Bang, Jaeyeon Bae, Donggyu Lee, Siyeol Jung, Taehwan Kim / April 23, 2026

arXiv:2604.19839v1 Announce Type: new
Abstract: Vision-language models (VLMs) have shown strong perception and reasoning abilities for instruction-following embodied agents. However, despite these abilities and their generalization performance, they s…

Author name: Jinsik Bang, Jaeyeon Bae, Donggyu Lee, Siyeol Jung, Taehwan Kim

Environmental Understanding Vision-Language Model for Embodied Agent