cs.LG, cs.MM

OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models

arXiv:2505.01448v2 Announce Type: replace
Abstract: Audio-visual segmentation aims to separate sounding objects from videos by predicting pixel-level masks based on audio signals. Existing methods primarily concentrate on closed-set scenarios and dire…