Qwen 3.6 27b MTP vLLM

Hello everyone, i am banging my head trying to properly configure qwen 3.6 27b mtp in vllm.

I am using vllm v0.20.0 in docker, unquantized model with tp4 (4 3090s), max context length.

At low context size, mtp with value of 3 gives the best results: 48-50 tps generation speed. However, once the context gets larger (> 70/80k) i the tps drops to 15-20 tps.

Without mtp i start from 30tps and degrades to 26-27 tps at large context.

For now i disabled it since i am testing agentic coding and even if i try to keep the context size bellow 50% (120-130k) i still go over 70k pretty often.

Any advice will be welcomed.

submitted by /u/niellsro
[link] [comments]

Leave a Comment