Weakly-Supervised Referring Video Object Segmentation through Text Supervision
arXiv:2604.17797v2 Announce Type: replace
Abstract: Referring video object segmentation (RVOS) aims to segment the target instance in a video, referred by a text expression. Conventional approaches are mostly supervised learning, requiring expensive p…