cs.CV

Learning to Track Instance from Single Nature Language Description

arXiv:2605.07064v1 Announce Type: new
Abstract: How to achieve vision-language (VL) tracking using natural language descriptions from a video sequence \textbf{without relying on any bounding-box ground truth}? In this work, we achieve this goal by tac…