ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
arXiv:2601.11404v2 Announce Type: replace
Abstract: Vision-Language-Action models have emerged as essential generalist robot policies for diverse manipulation tasks, conventionally relying on directly translating multimodal inputs into actions via Vis…