Bo Jiang, Sian Jin - Provide.ai

KVSculpt: KV Cache Compression as Distillation

Bo Jiang, Sian Jin / March 31, 2026

arXiv:2603.27819v1 Announce Type: new
Abstract: KV cache compression is critical for efficient long-context LLM inference. Approaches that reduce the per-pair footprint — quantization and low-rank decomposition — are orthogonal to those that reduce …

Author name: Bo Jiang, Sian Jin

KVSculpt: KV Cache Compression as Distillation