Yuzhen Mao, Qitong Wang, Martin Ester, Ke Li

IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs

Yuzhen Mao, Qitong Wang, Martin Ester, Ke Li / April 14, 2026

arXiv:2604.10539v1 Announce Type: new
Abstract: Key-Value (KV) cache plays a crucial role in accelerating inference in large language models (LLMs) by storing intermediate attention states and avoiding redundant computation during autoregressive gener…

Author name: Yuzhen Mao, Qitong Wang, Martin Ester, Ke Li

IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs