testing in progress ...we all need an increase in pp π
https://github.com/ggml-org/llama.cpp/releases/tag/b9200
u/am17an am17an commented 13 hours ago β’ Overview Avoid copying the logits for every token in the batch when doing prompt processing for MTP since it only requires the pre-norm. This reduces memory traffic quite a bit and in turn increases PP speed with MTP.
[link] [comments]