LocalLLaMA

Qwen 3.6 27b IQ4_XS – 22 tp/s on RTX 5060TI 16b, 24k ctx

Maybe it be helpful for someone: llama-server -m '/Qwen3.6-27B/Qwen3.6-27B-IQ4_XS.gguf' -ngl 999 -ctk q4_0 -ctv q4_0 -b 128 -ub 128 -c 24000 Cant run this model with higher kv quants on >8192ctx size. -ub & -b setted for 256 allowed me f…