LocalLLaMA

LocalLLaMA

Are i-Quants overrated?

We all know modern "intelligent" Quantization that uses an imatrix to make a Q4_K_XL model to feel like Q6_K. But here is what i notice: While this works well on most English tasks, the effect can be reversed on other languages or niche tasks…

LocalLLaMA

Speculative Decoding

I've started looking into what speculative decoding is/how it works in the past 30 minutes. I realize this is not a lot of time to try to understand something and hope you will forgive me. I have a cognitive block about this question now that I fee…

Scroll to Top