LocalLLaMA

Running Gemma 4 e4b (9.6GB RAM req) on RPi 5 8GB! Stable 2.8GHz Overclock & Custom Cooling

/u/AncientWin9492 / April 4, 2026

Finally got the Gemma 4 (E4B) model running on my Raspberry Pi 5 (8GB). Since the model requires about 9.6GB of RAM, I had to get creative with memory management. The Setup: Raspberry Pi OS. Lexar SSD (Essential for fast Swap). Memory Management:…

LocalLLaMA

Apple: Embarrassingly Simple Self-Distillation Improves Code Generation

/u/Mike_mi / April 4, 2026

submitted by /u/Mike_mi [link] [comments]

LocalLLaMA

Help running Qwen3-Coder-Next TurboQuant (TQ3) model

/u/UnluckyTeam3478 / April 4, 2026

I found a TQ3-quantized version of Qwen3-Coder-Next here: https://huggingface.co/edwardyoon79/Qwen3-Coder-Next-TQ3_0 According to the page, this model requires a compatible inference engine that supports TurboQuant. It also provides a command, but it d…

LocalLLaMA

Gemma4 issue with winogrande bench

/u/qdwang / April 4, 2026

gemma-4-26B-A4B-it-Q4_K_M can only get around 50% acc on winogrande-debiased-eval.csv with llama-perplexity. Meanwhile qwen3.5-35B-A3B-IQ4_NL can get about 75%+ acc. However, in real-world tasks, the Gemma 4 model performs very well. Why does this disc…

LocalLLaMA

Speed difference on Gemma 4 26B-A4B between Bartowski Q4_K_M and Unsloth Q4_K_XL

/u/BelgianDramaLlama86 / April 4, 2026

I've noticed this on Qwen3.5 35B before as well, there is a noticeable speed difference between Unsloth's Q4_K_XL and Bartowski's Q4_K_M on the same model, but Gemma 4 seems particularly harsh in this regard: Bartowski gets 38 tk/s, Unsloth…

LocalLLaMA

Gemma 4 fixes in llama.cpp

/u/jacek2023 / April 4, 2026

There have already been opinions that Gemma is bad because it doesn’t work well, but you probably aren’t using the transformers implementation, you’re using llama.cpp. After a model is released, you have to wait at least a few days for all the fixes in…

LocalLLaMA

Gemma 4 – 4B vs Qwen 3.5 – 9B ?

/u/No-Mud-1902 / April 4, 2026

Hello! anyone tried the 4B Gemma 4 model and the Qwen 3.5 9B model and can tell us their feedback? On the benchmark Qwen seems to be doing better, but I would appreciate any personal experience on the matter Thanks! submitted by /u/No-…

LocalLLaMA

Kokoro TTS running on-device, CPU-only, 20x realtime!!!

/u/aminsweiti / April 4, 2026

I wanted a reading app where you could read, read and listen or just listen to books with word-by-word highlighting synced to TTS and i wanted the voice to actually sound good. This turned out to be a really hard challenge with Kokoro on iOS, he…

LocalLLaMA

Qwen 3.5 397B vs Qwen 3.6-Plus

/u/LegacyRemaster / April 4, 2026

I see a lot of people worried about the possibility of QWEN 3.6 397b not being released. However, if I look at the small percentage of variation between 3.5 and 3.6 in many benchmarks, I think that simply quantizing 3.6 to "human" dimen…

LocalLLaMA

Speculative decoding works great for Gemma 4 31B in llama.cpp

/u/Leopold_Boom / April 4, 2026

I get a ~11% speed up with Gemma 3 270B as the draft model. Try it by adding: –no-mmproj -hfd unsloth/gemma-3-270m-it-GGUF:Q8_0 Testing with (on a 3090): ./build/bin/llama-cli -hf unsloth/gemma-4-31B-it-GGUF:Q4_1 –jinja –temp 1.0 –top-p 0.95 –top…