Qwen 3.6-27B Dense with MTP on Strix Halo Windows – Benchmarks

Qwen 3.6-27B Dense with MTP on Strix Halo Windows - Benchmarks

Here are some results (llama.cpp)!

Task 1: write a short poem
27B Dense: 12.5 tokens/s
27B Dense MTP: (spec-draft-n-max 6): 14.5 tokens/s
27B Dense MTP (spec-draft-n-max 3): 18.7 tokens/s

Task 2: edit a hello word html artifact
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 14.2 tokens/s
27B Dense MTP (spec-draft-n-max 3): 19.8 tokens/s

Task 3: create a hello world html directly in chat
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 17.9 tokens/s
27B Dense MTP (spec-draft-n-max 3): 23.2 tokens/s

It's fascinating how it varies with tasks!

https://preview.redd.it/bsrlgslasn1h1.png?width=1802&format=png&auto=webp&s=8aba6c751bf7c47494ce11697b91a4347fec79af

Settings used:

{
"name": "Qwen3.6-27B-UD-Q4_K_M",
"file": "Qwen3.6-27B-UD-Q4_K_M.gguf",
"custom": ["--mmproj", "C:/CarlAI/models/mmproj-Qwen_Qwen3.6-27B-bf16.gguf"],

"backend": "vulkan",

"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}

},

{

"name": "Qwen3.6-27B-UD-Q4_K_XL_MTP",
"file": "Qwen3.6-27B-UD-Q4_K_XL_MTP.gguf",
"custom": ["-np", "1", "--spec-type", "draft-mtp", "--spec-draft-n-max", "6"],

"backend": "vulkan",

"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}

},

submitted by /u/PromptInjection_
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top