b9200 released – potential mtp pp increase

By /u/Bulky-Priority6824 / May 17, 2026

testing in progress ...we all need an increase in pp 😆

https://github.com/ggml-org/llama.cpp/releases/tag/b9200

u/am17an am17an commented 13 hours ago • Overview Avoid copying the logits for every token in the batch when doing prompt processing for MTP since it only requires the pre-norm. This reduces memory traffic quite a bit and in turn increases PP speed with MTP.

submitted by /u/Bulky-Priority6824
[link] [comments]

Leave a Comment