StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
arXiv:2604.05014v1 Announce Type: cross
Abstract: Building generalist embodied agents requires integrating perception, language understanding, and action, which are core capabilities addressed by Vision-Language-Action (VLA) approaches based on multim…