GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx
The TheTom's turboquant's GPU accelerated turboquant (turbo3) has unlocked high context gains for the 35BA3B family. I can now achieve ~40tg/s via the following GPU-POOR compilation flags and configuration: cmake -B build -DGGML_CUDA=ON -DGGML_…