GGUF with MTP vs MLX without. Is mlx still the way to go for mac users?

Has anyone of the mac users tested the speed difference (token gen, promt processing) between mlx quants without mtp, vs gguf quants with mtp?

More or less once a month I wonder if mlx is still the correct path in mac. Some reasons:

- LM Studio has bad caching for mlx. And not MTP of course.
- omlx has very good cache + turboquant + dflash, but no MTP (yet, I see it will come soon since it is already in the dev branch).
- I have discovered two other engine wrappers that are interesting: rapid-mlx and mtplx, didn't try them yet. The second has MTP.

In general for MLX there is no alternative to llama.cpp that has it all, with so many configurations.

I keep using mlx, cause it is more efficient on a mac. But now with MTP already in llama.cpp, I wonder if using metal llama + MTP the speeds would be better than mlx.

And the most important part, the quant world has more options for the GGUFs.

Appreciate if someone has experience or knowledge to share.

submitted by /u/mouseofcatofschrodi
[link] [comments]

Leave a Comment