UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
arXiv:2605.06597v1 Announce Type: new
Abstract: Self-distillation (SD) offers a promising path for adapting large language models (LLMs) without relying on stronger external teachers. However, SD in autoregressive LLMs remains challenging because self…