Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning
arXiv:2605.06116v1 Announce Type: new
Abstract: Inference-time computation has greatly enhanced the performance of large language models (LLMs) on challenging reasoning tasks, but this strategy can incur high inference costs. One solution is to route …