Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey
arXiv:2603.04445v2 Announce Type: replace-cross
Abstract: The rapid growth of large language models (LLMs) with diverse capabilities, costs, and domains has created a critical need for intelligent model selection at inference time. While smaller model…