LocalLLaMA

LocalLLaMA

Qwen3.6-35B-A3B released!

Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. – Agentic coding on par with models 10x its active size – Strong multimodal perception and reasoning ability – Multimodal thinking + non-…

LocalLLaMA

Reproduction of TurboQuant

There have been many TurboQuant implementations recently in llama.cpp, mlx, vllm, and sglang, but a lot of the discussion and code around them feels pretty noisy and looks to be AI-generated. I’m trying to understand which claims from the paper have ac…

LocalLLaMA

A note of warning about DFlash.

It started saying 4/5x speed advantage against usual bf16 models (test are less optimistic but let think this is true). Then MoE gain is not that good, value was for dense models. Then quantization greatly reduces the gain, Q8_0 still gains, Q4_0 not …

LocalLLaMA

why gemma 4 31b so bad in long context?

question, I'm using it for text translations and on each large prompt (20K+) it stops with a remark 'now I'm going to put that to the file' or some other operation I have asked in the prompt for but it did nothing, just stopped. I'm…

Scroll to Top