KV Cache Offloading for Context-Intensive Tasks
arXiv:2604.08426v3 Announce Type: replace-cross
Abstract: With the growing demand for long-context LLMs across a wide range of applications, the key-value (KV) cache has become a critical bottleneck for both latency and memory usage. Recently, KV-cach…