FocusVLA: Focused Visual Utilization for Vision-Language-Action Models
arXiv:2603.28740v1 Announce Type: new
Abstract: Vision-Language-Action (VLA) models improve action generation by conditioning policies on rich vision-language information. However, current auto-regressive policies are constrained by three bottlenecks:…