Tested how OpenCode Works with SelfHosted LLMS: Qwen 3.5 & 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash…

I have run two tests on each LLM with OpenCode to check their basic readiness and convenience:

- Create IndexNow CLI in Golang (Easy Task) and

- Create Migration Map for a website following SiteStructure Strategy. (Complex Task)

Tested Qwen 3.5, & 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash and several other LLMs.

Context size used: 25k-50k - varies between tasks and models.

The result is in the table below, hope you find it useful.

The speed of most of these selfhosted LLMs - on RTX 4080 (16GB VRAM) is below (to give you idea how fast/slow each model is).

Used llama-server with default memory and layers params. Finetuning these might help you to improve speed a bit. Or maybe a bit more than a bit :)

---

My Takeaway:

Qwen 3.5 27b is a very decent LLM that suit my hardware well.

New Gemma 4 26b showed very good results, worth testing more.

Both these are comparable to cloudhosted free LLMs from OpenCode Zen - for these two tasks.

---