LocalLLaMA

LocalLLaMA

Gemma 4 fixes in llama.cpp

There have already been opinions that Gemma is bad because it doesn’t work well, but you probably aren’t using the transformers implementation, you’re using llama.cpp. After a model is released, you have to wait at least a few days for all the fixes in…

LocalLLaMA

Gemma 4 – 4B vs Qwen 3.5 – 9B ?

Hello! anyone tried the 4B Gemma 4 model and the Qwen 3.5 9B model and can tell us their feedback? On the benchmark Qwen seems to be doing better, but I would appreciate any personal experience on the matter Thanks! submitted by /u/No-…

LocalLLaMA

Qwen 3.5 397B vs Qwen 3.6-Plus

I see a lot of people worried about the possibility of QWEN 3.6 397b not being released. However, if I look at the small percentage of variation between 3.5 and 3.6 in many benchmarks, I think that simply quantizing 3.6 to "human" dimen…

LocalLLaMA

Quantizers appriciation post

Hey everyone, Yesterday I decided to try and learn how to quantize ggufs myself with reasonable quality, in order to understand the magic behind the curtain. Holy… I did not expect how much work it is, how long it takes, and requires A LOT (500GB!) o…

LocalLLaMA

Gemma 4 MoE hitting 120 TPS on Dual 3090s!

Thought I'd share some benchmark numbers from my local setup. Hardware: Dual NVIDIA RTX 3090s Model: Gemma 4 (MoE architecture) Performance: ~120 Tokens Per Second The efficiency of this MoE implementation is unreal. Even with a heavy load, the thr…

Scroll to Top