KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
arXiv:2604.13226v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) rely heavily on Key-Value (KV) caching to minimize inference latency. However, standard KV caches are context-dependent: reusing a cached document in a new context …