cs.AI, cs.CL, cs.LG

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

arXiv:2605.11609v1 Announce Type: cross
Abstract: On-policy self-distillation, where a student is pulled toward a copy of itself conditioned on privileged context (e.g., a verified solution or feedback), offers a promising direction for advancing reas…