You guys seen this? beats turboquant by 18%
https://github.com/Dynamis-Labs/spectralquant basically, they discard 97% of the kv cache key vectors after figuring out which ones have the most signal submitted by /u/OmarBessa [link] [comments]
https://github.com/Dynamis-Labs/spectralquant basically, they discard 97% of the kv cache key vectors after figuring out which ones have the most signal submitted by /u/OmarBessa [link] [comments]