It’s crazy how we have so many great models and technics that it’s turning into a complex optimization problem to find the perfect model, quant, kv cache quant for my system.
For instance, I have a single 3090ti and 128GB DDR4 Ram, I appreciate good speed(+20 t/s) and context size(+100k). I have these options from just Qwen 3.5 27B Qwen 3.5 35B MOE Qwen coder 80B Gemma 4 31B Gemma 4 26B MOE …and whole lot more options Jus…