3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks
arXiv:2505.05800v2 Announce Type: replace-cross
Abstract: Robotic manipulation in 3D requires effective computation of N degree-of-freedom joint-space trajectories that enable precise and robust control. To achieve this, robots must integrate semantic…