CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models
arXiv:2603.22846v2 Announce Type: replace
Abstract: Embodied Visual Tracking (EVT), a core dynamic task in embodied intelligence, requires an agent to precisely follow a language-specified target. Yet most existing methods rely on single-agent imitati…