DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
arXiv:2604.19351v2 Announce Type: replace
Abstract: The quadratic computational complexity of the standard attention mechanism constitutes a fundamental bottleneck for large language models in long-context inference. While existing KV cache compressio…