Code-switching in text and speech challenges information-theoretic speaker design
arXiv:2408.04596v2 Announce Type: replace
Abstract: In this work, we use language modeling to investigate the factors that influence insertional code-switching. Code-switching occurs when a speaker alternates between one language variety (the primary language) and another (the secondary language), and is widely observed in multilingual contexts. Recent work has shown that code-switching is often correlated with areas of low predictability in the primary language, but it is unclear whether low primary language predictability only makes the secondary language relatively easier to produce at code-switching points - that is, purely speaker-driven code-switching - or whether code-switching is additionally used by speakers for other purposes, for instance to signal the need for greater attention on the part of listeners. In this paper, we use bilingual Chinese-English online forum posts and transcripts of spontaneous Chinese-English speech to replicate prior findings that low primary language (Chinese) predictability is correlated with insertional switches to the secondary language (English). We then demonstrate that the predictability of the English productions is even lower than that of meaning-equivalent Chinese alternatives, and these are therefore not easier to produce, rejecting the purely speaker-driven theory of code-switching in both writing and speech.