I thought it would take way longer (and a macbook of the future) to do real coding locally. But it is happening in front of my eyes right now!
Im using qwen3.5 35b EDIT: qwen3.6 35B (mlx 4bit, running on omlx). It is not comparable to the big models, but it is the first that is starting to cross the line of being productive agentically. It has a level of intelligence enough not only to answer in a chat, but to solve problems, to code and to use tools. And it is FAST.
The other part of the equation is how to give it powers to do agentic tasks. Most tools I've tried (claude code, opencode, codex cli, etc) abuse so much of gigantic promt injections. They are so heavy the promt processing takes ages, the RAM explodes. So I thought I won't be able to use any local model agentically until a I get a new laptop. Maybe with an M7 or M8 lol.
But then I started testing pi (pi.dev), and with it I've been able to do already 3 real tickets on a real project!
It seems to be very efficient to understand the project and read only the necessary code. For one ticket it did it at one shot consuming around 7K tokens!!
For the other 2 I had to promt back some errors from the browser console (I guess this could get better adding the rule of checking on playwright to finish the tasks).
The only annoying problem so far is when qwen3.6 it starts looping on its thinking. I have the official sampling for coding with reasoning:
Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Also I have 126K context configured in omlx. Maybe the problem is the 4-bit mlx quant?
[link] [comments]