Llama.cpp MTP with Qwen3.6 27B on Headless RTX 3090
Saw some posts around PP being slower, so they were cautious on trying it. Here's a real-world datapoint. Settings: Headless RTX 3090 24G OpenCode Model unsloth's Qwen3.6-27B-MTP-Q4_K_M.gguf 128k context q8_0 kv cache –spec-draft-n-max: 3 –d…