Probing Cross-modal Information Hubs in Audio-Visual LLMs
arXiv:2605.10815v2 Announce Type: new
Abstract: Audio-visual large language models (AVLLMs) have recently emerged as a powerful architecture capable of jointly reasoning over audio, visual, and textual modalities. In AVLLMs, the bidirectional interact…