Zihao Han, Tiangang Zhang, Huaibin Wang, Yilun Sun

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

Zihao Han, Tiangang Zhang, Huaibin Wang, Yilun Sun / May 13, 2026

arXiv:2605.11458v1 Announce Type: cross
Abstract: On-policy self-distillation has become a strong recipe for LLM reasoning, where a privileged teacher supervises the student’s own rollouts while conditioning on the reference solution. A design choice …

Author name: Zihao Han, Tiangang Zhang, Huaibin Wang, Yilun Sun

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning