DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max)
A few days ago I posted early results from a native MLX implementation of DFlash. Since then I rewrote the benchmark methodology, fixed numerical issues, and open sourced the whole thing. A small draft model generates 16 tokens in parallel via bl…