Here are some results (llama.cpp)!
Task 1: write a short poem
27B Dense: 12.5 tokens/s
27B Dense MTP: (spec-draft-n-max 6): 14.5 tokens/s
27B Dense MTP (spec-draft-n-max 3): 18.7 tokens/s
Task 2: edit a hello word html artifact
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 14.2 tokens/s
27B Dense MTP (spec-draft-n-max 3): 19.8 tokens/s
Task 3: create a hello world html directly in chat
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 17.9 tokens/s
27B Dense MTP (spec-draft-n-max 3): 23.2 tokens/s
It's fascinating how it varies with tasks!
https://preview.redd.it/bsrlgslasn1h1.png?width=1802&format=png&auto=webp&s=8aba6c751bf7c47494ce11697b91a4347fec79af
Settings used:
{
"name": "Qwen3.6-27B-UD-Q4_K_M",
"file": "Qwen3.6-27B-UD-Q4_K_M.gguf",
"custom": ["--mmproj", "C:/CarlAI/models/mmproj-Qwen_Qwen3.6-27B-bf16.gguf"],
"backend": "vulkan",
"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}
},
{
"name": "Qwen3.6-27B-UD-Q4_K_XL_MTP",
"file": "Qwen3.6-27B-UD-Q4_K_XL_MTP.gguf",
"custom": ["-np", "1", "--spec-type", "draft-mtp", "--spec-draft-n-max", "6"],
"backend": "vulkan",
"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}
},
submitted by