Wenqian Cui, Xiao-Hui Li, Daxin Tan, Qiyong Zheng, Irwin King

Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM

Wenqian Cui, Xiao-Hui Li, Daxin Tan, Qiyong Zheng, Irwin King / May 8, 2026

arXiv:2605.05927v1 Announce Type: new
Abstract: Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce …

Author name: Wenqian Cui, Xiao-Hui Li, Daxin Tan, Qiyong Zheng, Irwin King

Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM