cs.AI, cs.CL

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation

arXiv:2605.09253v1 Announce Type: cross
Abstract: While recent work in Reinforcement Learning with Verifiable Rewards (RLVR) has shown that a small subset of critical tokens disproportionately drives reasoning gains, an analogous token-level understan…