Strategic Scaling of Test-Time Compute: A Bandit Learning Approach
arXiv:2506.12721v2 Announce Type: replace-cross
Abstract: Scaling test-time compute has emerged as an effective strategy for improving the performance of large language models. However, existing methods typically allocate compute uniformly across all …