Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation
arXiv:2512.14954v2 Announce Type: replace-cross
Abstract: Computing next-token likelihood ratios between two language models (LMs) is a standard task in training paradigms such as knowledge distillation. Since this requires both models to share the sa…