RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the –n-cpu-moe flag is the most important part.
Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun side note: I had Claude Opus 4.7 (just the $20 sub) build the config, launch the servers in the background, run the benchmarks, read the VRAM splits from the llama.cpp logs, and iter…