LocalLLaMA

LocalLLaMA

Q8 Cache

https://github.com/ggml-org/llama.cpp/pull/21038 Since now cache quantization has better quality, does that mean Q8 cache is a good choice now? For example for 26B Gemma4? submitted by /u/Longjumping_Bee_6825 [link] [comments]

LocalLLaMA

Gemma 4 31B — 4bit is all you need

Gemma quant comparison on M5 Max MacBook Pro 128GB (subjective of course, but on variety of categories): gemma 4 leaderboard the surprising bit: Gemma 4 31B 4bit scored higher than 8bit. 91.3% vs 88.4%. not sure why: could be the template, could …

Scroll to Top