LightAVSeg: Lightweight Audio-Visual Segmentation
arXiv:2605.08805v1 Announce Type: new
Abstract: Audio-Visual Segmentation (AVS) targets pixel level localization of sounding emitting objects in videos. However, existing models rely on dense cross-modal attention with quadratic computational cost, li…