LocalLLaMA

LocalLLaMA

Qwen 3.5 397B vs Qwen 3.6-Plus

I see a lot of people worried about the possibility of QWEN 3.6 397b not being released. However, if I look at the small percentage of variation between 3.5 and 3.6 in many benchmarks, I think that simply quantizing 3.6 to "human" dimen…

LocalLLaMA

Quantizers appriciation post

Hey everyone, Yesterday I decided to try and learn how to quantize ggufs myself with reasonable quality, in order to understand the magic behind the curtain. Holy… I did not expect how much work it is, how long it takes, and requires A LOT (500GB!) o…

LocalLLaMA

Gemma 4 MoE hitting 120 TPS on Dual 3090s!

Thought I'd share some benchmark numbers from my local setup. Hardware: Dual NVIDIA RTX 3090s Model: Gemma 4 (MoE architecture) Performance: ~120 Tokens Per Second The efficiency of this MoE implementation is unreal. Even with a heavy load, the thr…

Scroll to Top