Comparative Characterization of KV Cache Management Strategies for LLM Inference
arXiv:2604.05012v1 Announce Type: cross
Abstract: Efficient inference with Large Language Models (LLMs) increasingly relies on Key-Value (KV) caches to store previously computed key and value vectors at each layer. These caches are essential to minimi…