IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs
arXiv:2604.10539v1 Announce Type: new
Abstract: Key-Value (KV) cache plays a crucial role in accelerating inference in large language models (LLMs) by storing intermediate attention states and avoiding redundant computation during autoregressive gener…