TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
arXiv:2604.19769v1 Announce Type: cross
Abstract: Key-value (KV) caching is critical for efficient inference in large language models (LLMs), yet its memory footprint scales linearly with context length, resulting in a severe scalability bottleneck. E…