Gaze-Regularized Vision-Language-Action Models for Robotic Manipulation
arXiv:2603.23202v2 Announce Type: replace
Abstract: Despite advances in Vision-Language-Action (VLA) models, robotic manipulation struggles with fine-grained tasks because current models lack mechanisms for active visual attention allocation. Human ga…