Recently I did a little performance test of several LLMs on PC with 16GB VRAM

Recently I did a little performance test of several LLMs on PC with 16GB VRAM

Qwen 3.5, Gemma-4, Nemotron Cascade 2 and GLM 4.7 flash.

Tested to see how performance (speed) degrades with the context increase.

used llama.cpp and some nice quants better fitting for 16GB VRAM in my RTX 4080.

Here is a result comparison table. Hope you find it useful.

https://preview.redd.it/ylafftgx76tg1.png?width=827&format=png&auto=webp&s=16d030952f1ea710cd3cef65b76e5ad2c3fd1cd3

submitted by /u/rosaccord
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top