Rocks, Pebbles and Sand: Modality-aware Scheduling for Multimodal Large Language Model Inference
arXiv:2603.26498v1 Announce Type: cross
Abstract: Multimodal Large Language Models (MLLMs) power platforms like ChatGPT, Gemini, and Copilot, enabling richer interactions with text, images, and videos. These heterogeneous workloads introduce additiona…