| I saw a post about incoming MTP support in llama.cpp so i tried it out on a AI max 395 with 128GB DDR5 8000: Result : between 60 and 80 token/s from 40ish token/s without MTP (on the screen i was trying rocm but it's more like 40-45 token/s with vulkan) depending on the subject (some common math stuff seems to be the fastest). PP seems unchanged. The two GGUF on the screen capture are almost the same size : around 36GB each I have yet to try it on qwen 3.5 122B and there will be some tweaks to do with launch parameters but it's really impressive !! [link] [comments] |