/u/snapo84 - Provide.ai

2 old RTX 2080 Ti with 22GB vram each Qwen3.6 27B at 38 token/s with f16 kv cache

/u/snapo84 / May 15, 2026

PLEASE KEEP IN MIND BOTH OF MY CARDS ARE POWER LIMITED TO 150W (i hate noise) ——- Just wanted to share my current setup, that might help some users out there… services: llama-server: image: ghcr.io/ggml-org/llama.cpp:full-cuda12-b9128 cont…

Author name: /u/snapo84

2 old RTX 2080 Ti with 22GB vram each Qwen3.6 27B at 38 token/s with f16 kv cache