MTP – The proofs in the puddin! Using it with Qwen3.6-27b

Been running llama.cpp MTP with Qwen3.6-27B Q4_K_M as my daily coding assistant and got curious what was actually happening under the hood. Pulled the metrics from llama-server and charted a full session.

A few things stood out — generation speed tanks hard past 85K context (down 30-35% by 95K+), cold prefills are brutal but the KV cache slot-save feature is doing serious heavy lifting on hit rate. Config details and observations below, happy to answer questions.

Referring to this post: Get Faster Qwen3.6 27b

https://preview.redd.it/5o7u2v3qonzg1.png?width=656&format=png&auto=webp&s=6fcfad15edfd89599b18cca0bef726414d2d32f0

submitted by /u/admajic
[link] [comments]

Leave a Comment