cs.CL, cs.CV, cs.RO

ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models

arXiv:2604.09824v1 Announce Type: cross
Abstract: Vision language action (VLA) models enable generalist robotic agents but often exhibit language ignorance, relying on visual shortcuts and remaining insensitive to instruction changes. We present Prosp…