cs.AI, cs.CL, cs.DC, cs.LG

Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs

arXiv:2605.04357v1 Announce Type: cross
Abstract: The usage of large language models (LLMs) has grown increasingly fragmented, with no single model dominating. Meanwhile, cloud providers offer a wide range of mid-tier and older-generation GPUs that en…