LocalLLaMA

Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge

/u/Silver_Raspberry_811 / April 5, 2026

Just finished a 3-way head-to-head. Sharing the raw results because this sub has been good about poking holes in methodology, and I'd rather get that feedback than pretend my setup is perfect. Setup 30 questions, 6 per category (code, reasoning, a…

LocalLLaMA

But it’s so more fun

/u/moneyspirit25 / April 5, 2026

submitted by /u/moneyspirit25 [link] [comments]

LocalLLaMA

Gemma 4 for 16 GB VRAM

/u/Sadman782 / April 5, 2026

I think the 26B A4B MoE model is superior for 16 GB. I tested many quantizations, but if you want to keep the vision, I think the best one currently is: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/blob/main/gemma-4-26B-A4B-it-UD-IQ4_XS.gguf …

LocalLLaMA

Gemma 4 vs Whisper

/u/HuntKey2603 / April 5, 2026

Working on building live Closed Captions for Discord calls for my TTRPG group. With Gemma being able to do voice transcription and translation, does it still make sense to run Whisper + a smaller model for translation? Is it better, faster, or has some…

LocalLLaMA

its all about the harness

/u/Emotional-Breath-838 / April 5, 2026

over the course of the arc of local model history (the past six weeks) we have reached a plateau with models and quantization that would have left our ancient selves (back in the 2025 dark ages) stunned and gobsmacked at the progress we currently enjoy…

LocalLLaMA

Improved markdown quality, code intelligence for 248 languages, and more in Kreuzberg v4.7.0

/u/Eastern-Surround7763 / April 5, 2026

Kreuzberg v4.7.0 is here. Kreuzberg is a Rust-core document intelligence library that works with Python, TypeScript/Node.js, Go, Ruby, Java, C#, PHP, Elixir, R, C, and WASM. We’ve added several features, integrated OpenWEBUI, and made a big improvemen…

LocalLLaMA

Gemma 4 26b is the perfect all around local model and I’m surprised how well it does.

/u/pizzaisprettyneato / April 5, 2026

I got a 64gb memory mac about a month ago and I've been trying to find a model that is reasonably quick, decently good at coding, and doesn't overload my system. My test I've been running is having it create a doom style raycaster in html a…

LocalLLaMA

Basic PSA. PocketPal got updated, so runs Gemma 4.

/u/Sambojin1 / April 5, 2026

Just because I've seen a couple of "I want this on Android" questions, PocketPal got updated a few hours ago, and runs Gemma 4 2B and 4B fine. At least on my hardware (crappy little moto g84 workhorse phone). Love an app that gets regular…

LocalLLaMA

How to design capacity for running LLMs locally? Asking for a startup

/u/Final-Batz / April 5, 2026

Hello everyone. I'm at a startup of a team of less than 10 ppl. Everyone in our team wants to use AI to speed up their work and iron out issues faster, which LLMs can be used for. The purposes we use LLMs can be coding, sales presentations, pitch p…

LocalLLaMA

Signals – finding the most informative agent traces without LLM judges (arxiv.org)

/u/AdditionalWeb107 / April 5, 2026

Hello Peeps Salman, Shuguang and Adil here from Katanemo Labs (a DigitalOcean company). Wanted to introduce our latest research on agentic systems called Signals. If you've been building agents, you've probably noticed that there are far …