/u/herpnderpler - Provide.ai

GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx

/u/herpnderpler / April 16, 2026

The TheTom's turboquant's GPU accelerated turboquant (turbo3) has unlocked high context gains for the 35BA3B family. I can now achieve ~40tg/s via the following GPU-POOR compilation flags and configuration: cmake -B build -DGGML_CUDA=ON -DGGML_…

Author name: /u/herpnderpler

GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx