PACED: Distillation and On-Policy Self-Distillation at the Frontier of Student Competence
arXiv:2603.11178v3 Announce Type: replace-cross
Abstract: Standard LLM distillation treats all training problems equally — wasting compute on problems the student has already mastered or cannot yet solve. We empirically show that this inefficiency ha…