cs.AI, cs.AR, cs.DC, cs.PL

Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving

arXiv:2507.18454v2 Announce Type: replace-cross
Abstract: CPUs are critical for LLM serving due to their availability, cost efficiency, and edge applicability. However, efficient CPU serving is hindered by conflicting prefill/decode resource demands u…