cs.CV, cs.RO

ST-$\pi$: Structured SpatioTemporal VLA for Robotic Manipulation

arXiv:2604.17880v1 Announce Type: new
Abstract: Vision-language-action (VLA) models have achieved great success on general robotic tasks, but still face challenges in fine-grained spatiotemporal manipulation. Typically, existing methods mainly embed s…