StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference
arXiv:2604.06746v1 Announce Type: new
Abstract: As Large Language Models (LLMs) scale to support context windows exceeding one million tokens, the linear growth of Key-Value (KV) cache imposes severe memory capacity and bandwidth bottlenecks, constrai…