Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining
arXiv:2604.16391v1 Announce Type: new
Abstract: Vision-language-action (VLA) models have shown great potential in building generalist robots, but still face a dilemma-misalignment of 2D image forecasting and 3D action prediction. Besides, such a visio…