cs.DC, cs.IT, cs.LG, cs.NI, math.IT

GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference

arXiv:2605.10124v1 Announce Type: cross
Abstract: The recent growth of on-device Large Language Model (LLM) inference has driven significant interest in device-edge collaborative LLM inference. As a promising architecture, Speculative Decoding (SD) is…