MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint Reconstruction
arXiv:2602.03668v2 Announce Type: replace
Abstract: Latent actions learned from diverse human videos serve as pseudo-labels for vision-language-action (VLA) pretraining, but provide effective supervision only if they remain informative about the under…