/u/Acrobatic_Bee_6660

TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp

/u/Acrobatic_Bee_6660 / April 10, 2026

Results from combining two KV-cache reduction methods in llama.cpp on AMD/HIP: TurboQuant KV cache compression (turbo3): ~5.1× reduction TriAttention KV cache pruning (75% retention): ~1.33× reduction Combined: ~6.8× total KV reduction At 131K contex…

Author name: /u/Acrobatic_Bee_6660

TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp