ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving
arXiv:2604.00136v2 Announce Type: replace
Abstract: Multi-model LLM serving operates in a non-stationary, noisy environment: providers revise pricing, model quality can shift or regress without notice, and new models arrive regularly. More than a doze…