| Main reason is, that qunatization quality directly affects models performance and stability and this results in real usefullness. Even though GRM-2.6-Plus is in benchmarks better than qwen3.6 27b model from which it derives, it gives worse results than autoround Q2_K_mixed quant of qwen3.6 27b which is practically same in size. This is just one example, most of the quants i tested suffer from same problems and only few of them mostly with different quantization mechanism are usefull below Q5. I want to advocate for autoround quantization as standard for lower quants Q1-Q4, also apex was performing quite well, but size is larger, maybe you know of other alternative methods that give consistent results, because standard quants like Q4_K_M dont provide adequate results and often results in bugged behavior overall (looping, halucinations, inconsistency). Prompt: Create svg image of a pelican riding a bicycle Multiple examples of different quant results https://www.reddit.com/r/LocalLLaMA/comments/1szp96f/comment/oj3r4b1/ Autoround Q2_K_Mixed https://huggingface.co/sphaela/Qwen3.6-27B-AutoRound-GGUF Regular llama.cpp Q4_K_M https://huggingface.co/morikomorizz/GRM-2.6-Plus-GGUF This is just one example and the output quality is consistently worse, when i ask it tricky questions, how much it hallucinates, loops etc. Community should understand, that typical quantization under Q5-6 is inadequate for qwen models unless you tinker with it through some more intelligent mechanism like intel autoround does. Looping from my experience is for example direct symptom of broken quantization, occasional syntactic errors in agentic coding another. [link] [comments] |