Spatial-Aware Conditioned Fusion for Audio-Visual Navigation
arXiv:2604.02390v1 Announce Type: cross
Abstract: Audio-visual navigation tasks require agents to locate and navigate toward continuously vocalizing targets using only visual observations and acoustic cues. However, existing methods mainly rely on sim…