New Gemma 4 MTP on MLX?

By /u/purealgo / May 7, 2026

In case you haven't heard, Google just released Multi Token Prediction drafters for Gemma 4, a speculative decoding approach that pairs the main model with a lightweight drafter. It can predict several tokens ahead and then verify them in parallel, speeding up inference 2-3x faster.

Has anyone used this with MLX? I tried to without success. It does not seem to be supported yet.

submitted by /u/purealgo
[link] [comments]

Leave a Comment