cs.AI, cs.CL

DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference

arXiv:2604.24647v1 Announce Type: new
Abstract: Long-context reasoning is a critical capability of large language models (LLMs), enabling applications such as long-document understanding, summarization, and code generation. However, efficient autoregr…