DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
arXiv:2604.19351v1 Announce Type: new
Abstract: The quadratic computational complexity of the standard attention mechanism constitutes a fundamental bottleneck for large language models in long-context inference. While existing KV cache compression me…