LocalLLaMA

2 old RTX 2080 Ti with 22GB vram each Qwen3.6 27B at 38 token/s with f16 kv cache

PLEASE KEEP IN MIND BOTH OF MY CARDS ARE POWER LIMITED TO 150W (i hate noise) ——- Just wanted to share my current setup, that might help some users out there… services: llama-server: image: ghcr.io/ggml-org/llama.cpp:full-cuda12-b9128 cont…