DFlash Doubles the T/S Gen Speed of Qwen3.5 27B (BF16) on Mac M5 Max

The new DFlash support in oMLX 0.3.5 RC1 looks like it doubles (!!!) the speed of Qwen3.5 27B (BF16). Initial test. Generation T/S went from 9 to 22 T/S!

Models used (HuggingFace)

Main Model: Jackrong/MLX-Qwopus3.5-27B-v3-bf16
Draft Model: z-lab/Qwen3.5-27B-DFlash

System: M5 Max 128GB

DFlash on Github: https://github.com/bstnxbt/dflash-mlx?tab=readme-ov-file

oMLX (v0.3.5 RC1): https://omlx.ai

I'm not affiliated with any of the developers. Since the Qwen3.5 27B model is so good for the size, with speed being the only thing holding it back, I thought that this may help deploy this model locally at higher quants/full weights.

I've yet to test with OpenCode or other harness.

submitted by /u/MiaBchDave
[link] [comments]

Leave a Comment