| I have run two tests on each LLM with OpenCode to check their basic readiness and convenience: - Create IndexNow CLI in Golang (Easy Task) and - Create Migration Map for a website following SiteStructure Strategy. (Complex Task) Tested Qwen 3.5, & 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash and several other LLMs. Context size used: 25k-50k - varies between tasks and models. The result is in the table below, the most of exact quant names are in the speed test table. Hope you find it useful. --- Here in v2 I added tests of - Qwen 3.6 35b q3 and q4 => the result is worse then expected - Qwen 3 Coder Next => very good result - and Qwen 3.5 27b q3 Bartowsky => disappointed The speed of most of these selfhosted LLMs - on RTX 4080 (16GB VRAM) is below (to give you an idea how fast/slow each model is). Used llama.cpp with recommended temp, top-p and other params, and default memory and layers params. Finetuning these might help you to improve speed a bit. Or maybe a bit more than a bit :) My Takeaway from this test iteration: - Qwen 3.5 27b is a very decent LLM (Unthloth's quants) that suit my hardware well. - Qwen3 Coder Next is better then Qwen 3.5 and 3.6 35b. - Qwen 3.5 and 3.6 35b are good, but not good enough for my tasks. - Both Gemma 4 26b and 31b showed very good results too, though for self-hosing on 16GB VRAM the 31b variant is too big. --- The details of each LLM behaviour in each test are here: https://www.glukhov.org/ai-devtools/opencode/llms-comparison/ [link] [comments] |