LocalLLaMA

Experiment: Entropy + OLS + SVD for KV cache compression

I’ve been exploring KV cache optimization beyond Top-K pruning. Observation: pruning fails *selectively* – a few tokens cause large error spikes. So I tried: – entropy (selection) – OLS (reconstruction) – SVD (compression) Early results: – ~3× lower er…