| This is on RTX 3090 , llama.ccp main , linux arch. So what is everybody's experience so far , ive tested a few quants / llama.ccp forks and came right back to where i started pretty much , i couldnt get higher speed / quality than the UD IQ4 quant , i tried the Apex compact i , the tqr3_4Q . Even tho on paper they should be faster , i couldnt get better results than 120-130, so i kinda reverted to what i already had. The tqr3_4Q fits nicely tho its really small , but its like the q3 km quality so no point for me running in as i have like 4 GB vram left free even at 260k contex. I noticed i had a nice speed bump of like 10-15 tk/s going from the (general) temperate settings to the more (coding) preset specified by Unloth. Any1 else that managed to push it above 130 tk/s on rtx 3090? [link] [comments] |