cs.CL

Model-Aware Tokenizer Transfer

arXiv:2510.21954v2 Announce Type: replace
Abstract: Large Language Models (LLMs) are trained to support an increasing number of languages, yet their predefined tokenizers remain a bottleneck for adapting models to lower-resource or distinct-script lan…