Yuxuan Jiang, Runchao Li, Shubhashis Roy Dipta, Dawei Li, Zhao Yang

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation

Yuxuan Jiang, Runchao Li, Shubhashis Roy Dipta, Dawei Li, Zhao Yang / May 12, 2026

arXiv:2605.09253v1 Announce Type: cross
Abstract: While recent work in Reinforcement Learning with Verifiable Rewards (RLVR) has shown that a small subset of critical tokens disproportionately drives reasoning gains, an analogous token-level understan…

Author name: Yuxuan Jiang, Runchao Li, Shubhashis Roy Dipta, Dawei Li, Zhao Yang

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation