Hongyao Liu, Liuqun Zhai, Junyi Wang, Zhengru Fang

SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference

Hongyao Liu, Liuqun Zhai, Junyi Wang, Zhengru Fang / April 24, 2026

arXiv:2604.21231v1 Announce Type: cross
Abstract: Efficient inference for on-device Large Language Models (LLMs) remains challenging due to limited hardware resources and the high cost of the prefill stage, which processes the full input context to co…

Author name: Hongyao Liu, Liuqun Zhai, Junyi Wang, Zhengru Fang

SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference