PRIMED: Adaptive Modality Suppression for Referring Audio-Visual Segmentation via Biased Competition
arXiv:2605.07154v1 Announce Type: new
Abstract: Referring Audio-Visual Segmentation (Ref-AVS) seeks to localize and segment target objects in video frames based on visual, auditory, and textual referring cues. The task is challenging because the relev…