Just wanted to share that I'm pretty happy about Qwen 35b a3b agentic coding performance.
I'm running the model in q80 quant, kv cache both q8_0 as well, with 262144 in 4090 + 5060 ti, via llama.cpp backend with claude code pointing to localhost.
For demo/data analytics purposes, it works pretty well. I haven't used it for large codebases, but it definitely is better than gemma4 26b in my use case.
One thing that surprises me is that it seems to get better outcome in agentic coding, than chat. When using it with just chat UI, i found the code qwen35b provide a bit too clunky.
I wonder of others have compared its performance against open source harnesses (Pi / opencode).
[link] [comments]