cs.AI, cs.SD, eess.AS

Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction

arXiv:2604.05007v1 Announce Type: cross
Abstract: In Audio-Visual Navigation (AVN), agents must locate sound sources in unseen 3D environments using visual and auditory cues. However, existing methods often struggle with generalization in unseen scena…