AV-Master: Dual-Path Comprehensive Perception Makes Better Audio-Visual Question Answering
arXiv:2510.18346v2 Announce Type: replace
Abstract: Audio-Visual Question Answering (AVQA) requires models to effectively utilize both visual and auditory modalities to answer complex and diverse questions about audio-visual scenes. However, existing …