OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
arXiv:2505.01448v2 Announce Type: replace
Abstract: Audio-visual segmentation aims to separate sounding objects from videos by predicting pixel-level masks based on audio signals. Existing methods primarily concentrate on closed-set scenarios and dire…