XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations
arXiv:2511.02776v2 Announce Type: replace
Abstract: Recent progress in large-scale robotic datasets and vision-language models (VLMs) has advanced research on vision-language-action (VLA) models. However, existing VLA models still face two fundamental…