LocalLLaMA

I’ve updated my glorified Llama fork (LLM Inference Server) for P40’s to utilise MTP + TurboQuant + DFlash

submitted by /u/Sakatard [link] [comments]