Qwen 3.6 35B different quant speeds ?

https://preview.redd.it/bixb4erga2wg1.png?width=1464&format=png&auto=webp&s=2df10ab305a5cf4c4252496ec3df34422359066b

This is on RTX 3090 , llama.ccp main , linux arch.

So what is everybody's experience so far , ive tested a few quants / llama.ccp forks and came right back to where i started pretty much , i couldnt get higher speed / quality than the UD IQ4 quant , i tried the Apex compact i , the tqr3_4Q .

Even tho on paper they should be faster , i couldnt get better results than 120-130, so i kinda reverted to what i already had.

The tqr3_4Q fits nicely tho its really small , but its like the q3 km quality so no point for me running in as i have like 4 GB vram left free even at 260k contex.

I noticed i had a nice speed bump of like 10-15 tk/s going from the (general) temperate settings to the more (coding) preset specified by Unloth.

Any1 else that managed to push it above 130 tk/s on rtx 3090?

submitted by /u/cviperr33
[link] [comments]

Leave a Comment