I tried running AesSedai/MiMo-2.5-GGUF:Q4-K-M under llama.cpp (main tree, compiled 36hours ago)
Hardware: nvidia A6000 with 48GB RAM + 300GB CPU RAM
I had no success: error loading model: missing tensor blk.0.attn_q.weight ...
Is Mimo already supported under llama.cpp?
From what I read I guessed it runs but is not performnace tweaked yet.
Any hints what I did wrong?
We started using opencoder.
Our primary model is qwen3.6-27b-q8_0 at the moment.
Since qwen3.6-122B is not coming I wanted to test alternatives that can be used on the hardware mentioned or on a cluster of 2 x strix or 2 x dgx.
Mimo2.5 looks like outperforming 3.6-27b.
Even when we get useful code from 27b my naive belief is, that the quality of the primary model makes a big different. That's why am looking for the best available model for my hardware. Speed is not that important since the tasks can run overnight.
I am curious what others are using as locally hosted primary model?
[link] [comments]