KVSculpt: KV Cache Compression as Distillation
arXiv:2603.27819v1 Announce Type: new
Abstract: KV cache compression is critical for efficient long-context LLM inference. Approaches that reduce the per-pair footprint — quantization and low-rank decomposition — are orthogonal to those that reduce …