Adapting Text LLMs to Speech via Multimodal Depth Up-Scaling
arXiv:2604.00489v1 Announce Type: new
Abstract: Adapting pre-trained text Large Language Models (LLMs) into Speech Language Models (Speech LMs) via continual pretraining on speech data is promising, but often degrades the original text capabilities. W…