Multimodal In-context Learning for ASR of Low-resource Languages
arXiv:2601.05707v2 Announce Type: replace
Abstract: Automatic speech recognition (ASR) still covers only a small fraction of the world’s languages, mainly due to supervised data scarcity. In-context learning (ICL) with large language models (LLMs) add…