Adaptive Test-Time Compute Allocation for Reasoning LLMs via Constrained Policy Optimization
arXiv:2604.14853v1 Announce Type: new
Abstract: Test-time compute scaling, the practice of spending extra computation during inference via repeated sampling, search, or extended reasoning, has become a powerful lever for improving large language model…