Robust Audio-Text Retrieval via Cross-Modal Attention and Hybrid Loss
arXiv:2604.23323v1 Announce Type: new
Abstract: Audio-text retrieval enables semantic alignment between audio content and natural language queries, supporting applications in multimedia search, accessibility, and surveillance. However, current state-o…