cs.AI, cs.CL, cs.LG

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference

arXiv:2604.19769v1 Announce Type: cross
Abstract: Key-value (KV) caching is critical for efficient inference in large language models (LLMs), yet its memory footprint scales linearly with context length, resulting in a severe scalability bottleneck. E…