LocalLLaMA

KV cache compression on Qwen 3.6 — 1M context: 10.7GB → 6.9GB (V: 3.5× smaller)

Quick demo of KV cache compression on Qwen 3.6 at 1M context. In this run: KV cache: 10.74 GB → 6.92 GB V cache: 5.37 GB → 1.55 GB (~3.5× reduction) Still seeing near-zero PPL change in early tests (3 seeds), but focusing mainly on memory + long-…