/u/Antonio_Sammarzano

Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into

/u/Antonio_Sammarzano / April 21, 2026

Hi all, I wanted to share a setup that’s working for me with Qwen3.6-35B-A3B on a laptop RTX 4060 (8GB VRAM) + 96GB RAM. This is not an interactive chat setup. I’m using it as a coding subagent inside an agentic pipeline, so some of the choices below a…

Author name: /u/Antonio_Sammarzano

Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into