Local LLM setup for coding (pair programming style) – GPU vs MacBook Pro?

Hey everyone,

I'm a programmer and I'd love to use local LLMs as a kind of "superpower" to move faster in my day-to-day work.

Typical use case: I'm working on a codebase (Rust, Python, Go, or TypeScript with React/Vue), and I want the model to understand the existing project and implement new features on top of it — ideally writing code directly in my IDE, like a pair programming partner.

Right now I've tried cloud models like Claude, Qwen, ChatGPT, and GLM. Results are honestly great (especially Claude), but cost and privacy are starting to bother me — hence the interest in going local.

My current setup:

Ryzen 9 9950X 96 GB DDR5 RAM GPU still to choose

I'm considering a few options and I'm not sure what makes the most sense:

Option A: Add a GPU

Nvidia 5090 (~€ 3500) AMD R9700 32 GB (~€ 1300)

Option B: Go all-in on a MacBook Pro M5 Max (128 GB RAM, ~€ 7000)

My main questions: 1. Are there local LLMs that actually get close to Claude-level performance for coding tasks?

Are there solid benchmarks specifically for coding + codebase-aware edits?
Which local models are currently best for this kind of workflow?
How much VRAM / unified memory do you realistically need for this use case?
Dense vs MoE models - what works better locally?
Does generation speed really matter that much? (e.g. 45 tok/s vs 100+ tok/s in real usage)
What tools are people using for this? (IDE plugins, local agents, etc.)
How can I test these setups before dropping thousands on hardware?

Curious to hear from people who are actually running local setups for real dev work (not just demos). What's your experience like?

submitted by /u/bajis12870
[link] [comments]

Leave a Comment