/u/yes_i_tried_google

Running Qwen3.5 / Qwen3.6 with NextN MTP (Multi-Token Prediction) speculative decode in llama.cpp — single RTX 3090 Ti GPU guide

/u/yes_i_tried_google / May 7, 2026

I was asked for this guide, so here it is. Some overlap with someone else’s post from yesterday. YMMV! Too busy with work to write myself, so I asked Opus to write for me (I have validated the content!). I’m sure there will be debate over using q4 blah…

Author name: /u/yes_i_tried_google

Running Qwen3.5 / Qwen3.6 with NextN MTP (Multi-Token Prediction) speculative decode in llama.cpp — single RTX 3090 Ti GPU guide