Q8 Cache
https://github.com/ggml-org/llama.cpp/pull/21038 Since now cache quantization has better quality, does that mean Q8 cache is a good choice now? For example for 26B Gemma4? submitted by /u/Longjumping_Bee_6825 [link] [comments]
https://github.com/ggml-org/llama.cpp/pull/21038 Since now cache quantization has better quality, does that mean Q8 cache is a good choice now? For example for 26B Gemma4? submitted by /u/Longjumping_Bee_6825 [link] [comments]