Get faster qwen 3.6 27b

Using 100k context with 3090 with MTP GGUF and getting 50 t/s on llama.cpp

Thought I would knowledge share

Use https://huggingface.co/RDson/Qwen3.6-27B-MTP-Q4_K_M-GGUF

And am17an commit

/media/adam/D_DRIVE/LLM/llama-cpp-am17an/build/bin/llama-server

-m "/media/Qwen3.6-27B-Q4/Qwen3.6-27B-MTP-Q4_K_M.gguf" \

--ctx-size 100000 \

-ngl 99 -fa on \

--cache-type-k q4_0 --cache-type-v q4_0 \

--batch-size 2048 --ubatch-size 1024 \

--spec-type mtp --spec-draft-n-max 2 \

--flash-attn

Note: Spec draft 3 seemed to much for the 3090 at higher context

Why 100k context? Beside it slows down and 100k is enough for most tasks then compact and continue.

submitted by /u/admajic
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top