Yuxiao Yang, Xiaoyun Wang, Weitong Zhang

OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning

Yuxiao Yang, Xiaoyun Wang, Weitong Zhang / May 13, 2026

arXiv:2605.12400v1 Announce Type: new
Abstract: We study {on-policy self-distillation} (OPSD), where a language model improves its reasoning ability by distilling privileged teacher distributions along its own on-policy trajectories. Despite the perfo…

Author name: Yuxiao Yang, Xiaoyun Wang, Weitong Zhang

OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning