b9200 released – potential mtp pp increase

testing in progress ...we all need an increase in pp πŸ˜†

https://github.com/ggml-org/llama.cpp/releases/tag/b9200

u/am17an am17an commented 13 hours ago β€’ Overview Avoid copying the logits for every token in the batch when doing prompt processing for MTP since it only requires the pre-norm. This reduces memory traffic quite a bit and in turn increases PP speed with MTP.

submitted by /u/Bulky-Priority6824
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top