cs.AI, cs.CV, eess.IV

Beyond Textual Knowledge-Leveraging Multimodal Knowledge Bases for Enhancing Vision-and-Language Navigation

arXiv:2603.26859v1 Announce Type: new
Abstract: Vision-and-Language Navigation (VLN) requires an agent to navigate through complex unseen environments based on natural language instructions. However, existing methods often struggle to effectively capt…