MiMo-V2.5-GGUF (preview available)

Hi, AesSedai here -

I've put up a PR to support the text-to-text inference of MiMo V2.5 with llama.cpp (and should also support Pro, will work on those quants after finishing V2.5): https://github.com/ggml-org/llama.cpp/pull/22493

I've also put some quants up on HF (https://huggingface.co/AesSedai/MiMo-V2.5-GGUF), the Q8_0 as well as my usual MoE-optimized quants (for those unfamiliar, it's basically Q8_0 or Q6_K for most of the model, and quanting the FFNs down). There is a weird NAN issue with the Q4_K_M that I'm looking into, I believe it's the ffn_down_exps tensor on layer 47 (edit: fixed the NAN issue, uploading the working Q4_K_M now!)

Bartowski, Ubergarm, Unsloth, and the rest of our lovely llama quanting cartel should be following up with their own quants in the near future.

Since this is pre-merge though, there might be some changes but hopefully this PR gets reviewed and merged soon. Please let me know if there are any issues.

submitted by /u/Digger412
[link] [comments]

Leave a Comment