Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving
arXiv:2507.18454v2 Announce Type: replace-cross
Abstract: CPUs are critical for LLM serving due to their availability, cost efficiency, and edge applicability. However, efficient CPU serving is hindered by conflicting prefill/decode resource demands u…