LocalLLaMA

TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp

Results from combining two KV-cache reduction methods in llama.cpp on AMD/HIP: TurboQuant KV cache compression (turbo3): ~5.1× reduction TriAttention KV cache pruning (75% retention): ~1.33× reduction Combined: ~6.8× total KV reduction At 131K contex…