/u/Total_Activity_7550

How to run Qwen3.5-27B with speculative decoding with llama.cpp llama-server?

/u/Total_Activity_7550 / April 13, 2026

I run it on 2xRTX 3090. This is part of my llama-server presets file: [Qwen3.5-27B-bartowski] load-on-startup = true alias = Qwen3.5-27B-bartowski hf = bartowski/Qwen_Qwen3.5-27B-GGUF:Q8_0 hfd = bartowski/Qwen_Qwen3.5-2B-GGUF:Q8_0 draft-min = 1 draft-m…

Author name: /u/Total_Activity_7550

How to run Qwen3.5-27B with speculative decoding with llama.cpp llama-server?