cs.CL

Adapting Text LLMs to Speech via Multimodal Depth Up-Scaling

arXiv:2604.00489v1 Announce Type: new
Abstract: Adapting pre-trained text Large Language Models (LLMs) into Speech Language Models (Speech LMs) via continual pretraining on speech data is promising, but often degrades the original text capabilities. W…