From Where Words Come: Efficient Regularization of Code Tokenizers Through Source Attribution
arXiv:2604.14053v1 Announce Type: new
Abstract: Efficiency and safety of Large Language Models (LLMs), among other factors, rely on the quality of tokenization. A good tokenizer not only improves inference speed and language understanding but also pro…