Llama.cpp MTP support now in beta!

Happy to report that llama.cpp MTP support is now in beta, thanks to Aman (and all the others that have pushed the various issues in the meantime). This has the potential to actually get merged soon-ish. Currently contains support for Qwen3.5 MTP, but other models are likely to follow suit.

Between this and the maturing tensor-parallel support, expect most performance gaps between llama.cpp and vLLM, at least when it comes to token generation speeds, to be erased.

submitted by /u/ilintar
[link] [comments]

Leave a Comment