cs.AI, cs.CV

Learning Vision-Language-Action World Models for Autonomous Driving

arXiv:2604.09059v1 Announce Type: new
Abstract: Vision-Language-Action (VLA) models have recently achieved notable progress in end-to-end autonomous driving by integrating perception, reasoning, and control within a unified multimodal framework. Howev…