| Been running llama.cpp MTP with Qwen3.6-27B Q4_K_M as my daily coding assistant and got curious what was actually happening under the hood. Pulled the metrics from llama-server and charted a full session. A few things stood out — generation speed tanks hard past 85K context (down 30-35% by 95K+), cold prefills are brutal but the KV cache slot-save feature is doing serious heavy lifting on hit rate. Config details and observations below, happy to answer questions. Referring to this post: Get Faster Qwen3.6 27b [link] [comments] |