| I am already sick entire of this (unsloth and others) approach of "let's be the first cause we know we have people starving for new models" while otherwise never caring to prove - like most of the other quants creators - if their quants are any good like checking PPL for disastrous faults like "NaN" and/or measure PPL and KLD. Latest proof of this useless rush is their "UD-Q4_K_XL" of MiniMax-M2.7-GGUF where a simple PPL measuring shows the model to be utterly broken. For the people askign what is "NaN" in quant PPL measurement that would normally point out the existence of numerical issues with the backend kernels or the quant itself, it's about a rushed in / never checked quant error. I have checked similar quants from other HF providers (aessedai/MiniMax-M2.7-Q5_K_M --> 157.226 GiB (5.906 BPW) and ubergarm/MiniMax-M2.7-IQ5_K --> 157.771 GiB (5.926 BPW)) and no such error is present But this is not about backend kernels, nor about unsloth much-hyped "poisoned CUDA 13.2". There are ways to avoid these before publishing quants in a rush (like " Please Unsloth, get in line with QA and abide by the already accepted "GGUF quanting community" on HF and transparently provide PPL and KLD data. At least do it internally as a hygene measure to avoid such flops. Rush it not!
VS
[link] [comments] |