Hello,
I'm currently using Ollama / lm studio for things like code inference and proof reading emails, etc. Definitely not experienced in this space but looking to grow.
It's been working great but it's a bit slow at times. I use Gemma 4 / Qwen, I also recently tried using OpenbioLLM 70B for some health questions (for testing) In addition to hooking up vscode / jet brains stuff to it. I also use it open webUI so my wife and I have our own chats going
I was thinking of trying either vllm or llama.cpp to see if there are some improvements on speed.
Specs 64Gb ram + backwell 5000 Ubuntu 26.04
I asked chatgpt which one I should use and it told me to just stick with ollama :/
Thanks for your time.
[link] [comments]