How to design capacity for running LLMs locally? Asking for a startup

Hello everyone. I'm at a startup of a team of less than 10 ppl. Everyone in our team wants to use AI to speed up their work and iron out issues faster, which LLMs can be used for.
The purposes we use LLMs can be coding, sales presentations, pitch preparations, and designs.
The focus for us from this exercise is to ensure the IP/ sensitive data is not trained or fed into the closed LLMs, for the reason being that it could be a compromise. Hence, we are looking to host LLMs locally like Qwen, Kimi, Gemma, Deepseek, Llama (happy to know if there are better open source models). Also, have the capacity to replace the model with the latest launched and performing one, when needed.

Can you advise us on a couple of things below based on your experiences:

Which models are good for a. coding b. text generation for reports/ ppts c. image/ video generations?
What hardware capacities should we host on? Say, should we use a mix of EPYC 7763 + 1TB 3200MHz DDR4 + 2x3090?

For local hosting on hardware, we would want to start with the minimum possible budget but build it in such a way that it supports scale when required.

Happy to hear any other suggestions too.

submitted by /u/Final-Batz
[link] [comments]

Leave a Comment