MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation
arXiv:2510.09065v2 Announce Type: replace-cross
Abstract: We introduce MMAudioSep, a generative model for video/text-queried sound separation that is founded on a pretrained video-to-audio model. By leveraging knowledge about the relationship between …