Omni2Sound: Towards Unified Video-Text-to-Audio Generation
arXiv:2601.02731v3 Announce Type: replace-cross
Abstract: Training a unified model integrating video-to-audio (V2A), text-to-audio (T2A), and joint video-text-to-audio (VT2A) generation offers significant application flexibility, yet faces two unexplo…