Good day. I am looking for best local model for coding agent. I might've missed something or some model which is not that widely used so I cam here for the help.
Currently I have following models I found useful in agentic coding via Google's turbo quant applied on llama.cpp:
- GLM 4.7 Flash Q4_K_M -> 30B
- 30B Nemotron 3 Q4_K_M -> 30B
- Qwen3 Coder Next Q4_K_M -> 80B
I really was trying to get Qwen3 Coder Next to get a decent t/s for input and output as I thought it would be a killer but to my surprise...it sometimes makes so silly mistakes that I have to do lots of babysitting for agentic flow.
GLM 4.7 and Nemotron are the ones I really can't decide between, both have decent t/s for agentic coding and I use both to maxed context window.
The thing is that I feel there might be some model that just missed from my sight.
Any suggestions?
My Rig:
RTX 4090, 64GB 5600 MT/S ram
Thank you in advance
[link] [comments]