SRA: Span Representation Alignment for Large Language Model Distillation
arXiv:2605.01205v1 Announce Type: new
Abstract: Cross-Tokenizer Knowledge Distillation (CTKD) enables knowledge transfer between a large language model and a smaller student, even when they employ different tokenizers. While existing approaches mainly…