The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models
arXiv:2601.02954v3 Announce Type: replace-cross
Abstract: Large audio-language models have made rapid progress in recognizing what is present in an audio clip, but spatial audio-language understanding still lacks a clear task interface. A model must a…