Yasmin Moslem, John D. Kelleher

Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey

Yasmin Moslem, John D. Kelleher / April 22, 2026

arXiv:2603.04445v2 Announce Type: replace-cross
Abstract: The rapid growth of large language models (LLMs) with diverse capabilities, costs, and domains has created a critical need for intelligent model selection at inference time. While smaller model…

Author name: Yasmin Moslem, John D. Kelleher

Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey