/u/cleversmoke - Provide.ai

Llama.cpp MTP with Qwen3.6 27B on Headless RTX 3090

/u/cleversmoke / May 17, 2026

Saw some posts around PP being slower, so they were cautious on trying it. Here's a real-world datapoint. Settings: Headless RTX 3090 24G OpenCode Model unsloth's Qwen3.6-27B-MTP-Q4_K_M.gguf 128k context q8_0 kv cache –spec-draft-n-max: 3 –d…

Author name: /u/cleversmoke

Llama.cpp MTP with Qwen3.6 27B on Headless RTX 3090