Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training
arXiv:2511.07372v3 Announce Type: replace
Abstract: Recent curriculum techniques in the post-training stage of LLMs have been empirically observed to outperform non-curriculum approaches in improving reasoning performance, yet a principled understanding of their effectiveness and limitations remains incomplete. To bridge this gap, we develop an abstract theoretical framework and identify sufficient conditions under which curriculum post-training yields exponential improvements in sample complexity.
To substantiate this framework, we model the base model's Chain-of-Thought generation as a state-conditioned autoregressive reasoning tree, and formalize curriculum subtasks as either depth-increasing curricula that progressively extend reasoning horizons or hint-decreasing curricula that gradually remove partial hints. Our analysis shows that reinforcement learning finetuning with both curriculum strategies achieves high accuracy with polynomial sample complexity, whereas non-curriculum counterpart encounters an exponential complexity bottleneck. We further establish analogous guarantees for test-time scaling. Empirical simulations support our theoretical findings. Code is available at https://github.com/DakeBU/Curriculum-Post-training.