/u/Longjumping_Bee_6825

Q8 Cache

/u/Longjumping_Bee_6825 / April 14, 2026

https://github.com/ggml-org/llama.cpp/pull/21038 Since now cache quantization has better quality, does that mean Q8 cache is a good choice now? For example for 26B Gemma4? submitted by /u/Longjumping_Bee_6825 [link] [comments]

Author name: /u/Longjumping_Bee_6825

Q8 Cache