TIP: Token Importance in On-Policy Distillation
arXiv:2604.14084v2 Announce Type: replace
Abstract: On-policy knowledge distillation (OPD) trains a student on its own rollouts under token-level supervision from a teacher. Not all token positions matter equally, but existing views of token importanc…