Experiment: Entropy + OLS + SVD for KV cache compression

I’ve been exploring KV cache optimization beyond Top-K pruning.

Observation: pruning fails *selectively* - a few tokens cause large error spikes.

So I tried:

- entropy (selection)
- OLS (reconstruction)
- SVD (compression)

Early results:

- ~3× lower error at low memory
- avoids error spikes
- sometimes even lower memory

Blog: https://jchandra.com/posts/hae-ols/

Still a prototype - would love feedback, especially where this might break.

submitted by /u/Many_Perception_1703
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top