LocalLLaMA

Qwen3.6-35B-A3B released!

/u/ResearchCrafty1804 / April 16, 2026

Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. – Agentic coding on par with models 10x its active size – Strong multimodal perception and reasoning ability – Multimodal thinking + non-…

LocalLLaMA

Released Qwen3.6-35B-A3B

/u/NewEconomy55 / April 16, 2026

https://x.com/Alibaba_Qwen/status/2044768734234243427 https://huggingface.co/Qwen/Qwen3.6-35B-A3B submitted by /u/NewEconomy55 [link] [comments]

LocalLLaMA

Reproduction of TurboQuant

/u/ExpensivePilot1431 / April 16, 2026

There have been many TurboQuant implementations recently in llama.cpp, mlx, vllm, and sglang, but a lot of the discussion and code around them feels pretty noisy and looks to be AI-generated. I’m trying to understand which claims from the paper have ac…

LocalLLaMA

ggml: add graph_reused by am17an · Pull Request #21764 · ggml-org/llama.cpp

/u/jacek2023 / April 16, 2026

CUDA speedup submitted by /u/jacek2023 [link] [comments]

LocalLLaMA

A note of warning about DFlash.

/u/R_Duncan / April 16, 2026

It started saying 4/5x speed advantage against usual bf16 models (test are less optimistic but let think this is true). Then MoE gain is not that good, value was for dense models. Then quantization greatly reduces the gain, Q8_0 still gains, Q4_0 not …

LocalLLaMA

why gemma 4 31b so bad in long context?

/u/Steus_au / April 16, 2026

question, I'm using it for text translations and on each large prompt (20K+) it stops with a remark 'now I'm going to put that to the file' or some other operation I have asked in the prompt for but it did nothing, just stopped. I'm…

LocalLLaMA

Built a fully offline batch image-to-SVG pipeline on Apple Silicon — Moondream → GroundingDINO → SAM 2.1 HQ → VitMatte → VTracer, nothing leaves the machine

/u/tsevis / April 16, 2026

I've been building a macOS app called Skiagrafia that takes folders of photos and produces layered SVG vector graphics and TIFF alpha mattes. The entire inference stack runs locally — TRANSFORMERS_OFFLINE=1, HF_HUB_OFFLINE=1, Ollama for the V…

LocalLLaMA

Your model might not be the problem: 13 KB rewrites took us from 60% to 100% extraction on Llama 3.1 8B

/u/Ambitious-Hornet-841 / April 16, 2026

Most agent projects I see talk about the model first and the docs second. We ended up doing the opposite, and it’s the only reason our 8Bparameter agent actually works in production. We’re building a bounded domain data agent (Oracle Forge) that handle…

LocalLLaMA

DeepSeek Updated their repo DeepGEMM testing Mega MoE

/u/External_Mood4719 / April 16, 2026

https://github.com/deepseek-ai/DeepGEMM/pull/304 https://github.com/deepseek-ai/DeepGEMM/commit/a050d09461e86eb6bba35a8c74fc0e296e8e16c7#diff-59e30829961e1b429bc12115673562f6f15d2ed347cac8d27a879bf101e977cb New features Mega MoE, fusing & overlappi…

LocalLLaMA

Two LLMs competing on coding problems to train each other

/u/Outrageous_Mark9761 / April 16, 2026

The core idea: two instances of the same model solve identical coding problems independently. Better solution becomes chosen, worse becomes rejected in a DPO pair. Fine-tune. Repeat. Measure on HumanEval (never trained on). What makes this different fr…