cs.CL, cs.LG

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

arXiv:2605.10781v1 Announce Type: cross
Abstract: Self-distillation has emerged as a powerful framework for post-training LLMs, where a teacher conditioned on extra information guides a student without it, both from the same model. While this guidance…