TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents
arXiv:2604.24005v2 Announce Type: replace
Abstract: On-policy distillation (OPD) has shown strong potential for transferring reasoning ability from frontier or domain-specific models to smaller students. While effective on static single-turn tasks, it…