cs.CL, cs.NA, math.NA

KazByte: Adapting Qwen models to Kazakh via Byte-level Adapter

arXiv:2603.27859v1 Announce Type: new
Abstract: Large language models fragment Kazakh text into many more tokens than equivalent English text, because their tokenizers were built for high-resource languages. This tokenizer tax inflates compute, shorte…