/u/HumanDrone8721 - Provide.ai

Please help improving a CPU-only inference speed

/u/HumanDrone8721 / April 25, 2026

This is a request for help for the people that want to use locally very large models on Q8 and better quanta at all costs, in my case the cost is inference speed. So I have a 512GB DDR4 ECC 2666 with a Threadripper Pro 3945WS that gives me ca. 5-7tok/s…

Author name: /u/HumanDrone8721

Please help improving a CPU-only inference speed