/u/xjE4644Eyc - Provide.ai

Strix Halo Llama.cpp MTP Benchmarks: 27B Gets Much Faster, 35B Is Mixed

/u/xjE4644Eyc / May 16, 2026

TL;DR All models were Qwen3.6 27B-MTP vs Base 27B (15k single-turn): Faster overall Total Time (wall): 87.44s → 77.39s (10.05s faster / -11.50%) Generation: 7.63 → 16.15 t/s (+111.77% speedup) Prompt Processing: 279.75 → 244.90 t/s (-12.46% slowdown) …

Author name: /u/xjE4644Eyc

Strix Halo Llama.cpp MTP Benchmarks: 27B Gets Much Faster, 35B Is Mixed