Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
arXiv:2605.04357v1 Announce Type: cross
Abstract: The usage of large language models (LLMs) has grown increasingly fragmented, with no single model dominating. Meanwhile, cloud providers offer a wide range of mid-tier and older-generation GPUs that en…