Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming
arXiv:2604.01302v1 Announce Type: new
Abstract: We study how to scale reasoning token budgets for competitive programming through two complementary approaches: training-time reinforcement learning (RL) and test-time parallel thinking. During RL traini…