Author name: /u/No_Shift_4543

DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max)

/u/No_Shift_4543 / April 13, 2026

A few days ago I posted early results from a native MLX implementation of DFlash. Since then I rewrote the benchmark methodology, fixed numerical issues, and open sourced the whole thing. A small draft model generates 16 tokens in parallel via bl…

LocalLLaMA

DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)

/u/No_Shift_4543 / April 11, 2026

I'm building a native MLX implementation of DFlash (paper) for Apple Silicon. A small draft model generates 16 tokens in parallel via block diffusion, the target verifies them in one forward pass. Output is bit-for-bit identical to baseline (…