Are you quanting your memory?

Title.

Curious about how people are generally dealing with the kv cache. BF16? Q8? Q4? Turboquant or some other secret sauce?

I run bf16 everything hoping that I'd get less hallucinations and because that's what the g4 and q3.6 are natively trained on anyways. But very interested to hear if people are having good results running q8 or q4 or if anyone has good results using turbo3/4 or similar.

submitted by /u/Plastic-Stress-6468
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top