cs.CL

Toward LLMs Beyond English-Centric Development

arXiv:2605.15613v1 Announce Type: new
Abstract: Through an analysis of sequences generated by open-weight large language models (LLMs), we demonstrate that LLMs are heavily biased toward English. While continual pre-training is commonly used to adapt …