So I'm brand new to this scene but I'm using Claude to help me fine tune a model for a startup idea I have in the Healthcare space.
I have been working with the 27-35B parameter mdoels (Qwen3.6, Gemma 4) and the couple of 120B+ models (Qwen 3.5, Minimax 2.7) and had honestly found most of them serviceable but the tradeoffs have been real in terms of speed and knowledge.
Queue today when I started using Qwen3-coder-next for MLX and goddman, it's the fastest model I've tried (Even faster than Qwen 3.5-35B-a3B which was my previous fastest model) and the output quality has honestly been outstanding, I would say better than the 120b parameter models. I don't know how many parameters it has but size-wise it's ±80Gb in memory vs 120gb for Minimax 2.7 or Qwen 3.5.
Am I over-reacting or this the sweet spot for any Mac 128 Gb (I'm running an M2U 192Gb)?
[link] [comments]