cs.AI, cs.LG

From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation

arXiv:2605.11613v1 Announce Type: new
Abstract: On-policy self-distillation has emerged as a promising paradigm for post-training language models, in which the model conditions on environment feedback to serve as its own teacher, providing dense token…