Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
arXiv:2605.11609v1 Announce Type: cross
Abstract: On-policy self-distillation, where a student is pulled toward a copy of itself conditioned on privileged context (e.g., a verified solution or feedback), offers a promising direction for advancing reas…