cs.AI, cs.CV

Dual-Anchoring: Addressing State Drift in Vision-Language Navigation

arXiv:2604.17473v1 Announce Type: new
Abstract: Vision-Language Navigation(VLN) requires an agent to navigate through 3D environments by following natural language instructions. While recent Video Large Language Models(Video-LLMs) have largely advance…