https://github.com/Dynamis-Labs/spectralquant
basically, they discard 97% of the kv cache key vectors after figuring out which ones have the most signal
[link] [comments]
https://github.com/Dynamis-Labs/spectralquant
basically, they discard 97% of the kv cache key vectors after figuring out which ones have the most signal