LocalLLaMA

If you haven’t yet given Gemma 4 a go…do it today

/u/No-Anchovies / April 11, 2026

I have a modest rig that allows me to run Qwen 3.5 27B or even 35B via Ollama. Qwen has been amazing to work with and I've been fine with the slow drip trade-off. Then Google released Gemma4. Its fast – like 4 or 9B fast. Accuracy and confidence w…

LocalLLaMA

The definitive Qwen 3.5 Jinja template

/u/ex-arman68 / April 11, 2026

I’ve been doing a pretty thorough deep dive into the Qwen 3.5 templating logic to properly fix the lingering tool calling bugs. People here have done some really brilliant groundwork, templates from folks like @pneuny and @ellary were absolute lifesave…

LocalLLaMA

Dual A100X local workflow

/u/vitamins1000 / April 11, 2026

Came across these A100X's at work and decided to keep them for internal use. We were not sure what to use them for but I came up with a work flow to use RAG to allow a local model to access our inventory database and have users interact with …

LocalLLaMA

Experience of using OpenClaude and Gemma4 26b

/u/nonekanone / April 11, 2026

Hi Guys, I am relatively new to the LocalLLM scene, and today I started to download my first Local LLM with Gemma 4 26b. I am using Ollama and am running on a M1 Max with 32GB of RAM. When I just use Gemma 4 inside of Ollama, it works like a charm. It …

LocalLLaMA

FT – China’s Alibaba shifts towards revenue over open-source AI

/u/LegacyRemaster / April 11, 2026

https://www.ft.com/content/b39da303-3188-447b-8b65-3dd8dad8b59a?syn-25a6b1a6=1t Is it true? submitted by /u/LegacyRemaster [link] [comments]

LocalLLaMA

Simulating human cognition in LLM agents: a free 126K-word book covering memory decay, emotion engines, personality drift, and 12 other cognitive subsystems

/u/Awkward-Educator6293 / April 11, 2026

Most LLM agents treat the model as the entire cognitive system. System prompt defines personality, RAG handles memory, chain-of-thought handles planning. It works until it doesn't, and when it breaks, there's no structural theory to debug again…

LocalLLaMA

Don’t buy Mac Studio now.

/u/JacketDangerous9555 / April 11, 2026

I've been totally obsessed with local models lately, and with some cybersecurity projects that need to run locally, I'm gearing up to grab a Mac Studio—staring at this page every day. And I just found out!!! Last month, after Apple quietl…

LocalLLaMA

If Dense Models are better for Coding, why are Qwen-Coders MoE?

/u/LocalLLaMa_reader / April 11, 2026

Hi all, have been reading here for over two years and finally have a question I can't find an answer to. Qwen 3.5 27B and Gemma 4 31B have been the latest examples of dense models performing much more accurately and in general tasks requiring highe…

LocalLLaMA

DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)

/u/No_Shift_4543 / April 11, 2026

I'm building a native MLX implementation of DFlash (paper) for Apple Silicon. A small draft model generates 16 tokens in parallel via block diffusion, the target verifies them in one forward pass. Output is bit-for-bit identical to baseline (…

LocalLLaMA

Run Qwen3.5-397B-A13B with vLLM and 8xR9700

/u/djdeniro / April 11, 2026

Special thanks for u/Sea-Speaker1700 to make possible run mxfp4 on R0700 GPU, first guide to run 122B models here Well, 397B model works amazing, super fast. Use this Dockerfile to build image, original image provided by u/Sea-Speaker1700 FROM…