From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
arXiv:2509.20072v4 Announce Type: replace
Abstract: Recent advances in large language models (LLMs) have attracted significant interest in extending their capabilities to multimodal scenarios, particularly for speech-to-speech conversational systems. …