/u/sandropuppo - Provide.ai

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

/u/sandropuppo / April 27, 2026

Hey fellow Llamas, your time is precious, so I'll keep it short. We built a GGUF port of DFlash speculative decoding. Standalone C++/CUDA stack on top of ggml, runs on a single 24 GB RTX 3090, hosts the new Qwen3.6-27B. We call it Luce DFla…

Author name: /u/sandropuppo

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090