Best model for 4090 as AI Coding Agent

Good day. I am looking for best local model for coding agent. I might've missed something or some model which is not that widely used so I cam here for the help.

Currently I have following models I found useful in agentic coding via Google's turbo quant applied on llama.cpp:

  • GLM 4.7 Flash Q4_K_M -> 30B
  • 30B Nemotron 3 Q4_K_M -> 30B
  • Qwen3 Coder Next Q4_K_M -> 80B

I really was trying to get Qwen3 Coder Next to get a decent t/s for input and output as I thought it would be a killer but to my surprise...it sometimes makes so silly mistakes that I have to do lots of babysitting for agentic flow.

GLM 4.7 and Nemotron are the ones I really can't decide between, both have decent t/s for agentic coding and I use both to maxed context window.

The thing is that I feel there might be some model that just missed from my sight.

Any suggestions?

My Rig:
RTX 4090, 64GB 5600 MT/S ram

Thank you in advance

submitted by /u/Dry_Sheepherder5907
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top