Junlong Ke, Zichen Wen, Weijia Li, Conghui He, Linfeng Zhang

Respecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoning

Junlong Ke, Zichen Wen, Weijia Li, Conghui He, Linfeng Zhang / May 14, 2026

arXiv:2605.13255v1 Announce Type: new
Abstract: On-policy self-distillation trains a reasoning model on its own rollouts while a teacher, often the same model conditioned on privileged context, provides dense token-level supervision. Existing objectiv…

Author name: Junlong Ke, Zichen Wen, Weijia Li, Conghui He, Linfeng Zhang

Respecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoning