Getting a feel for how fast X tokens/second really is.

I love following all your adventures with local LLM setups. Quality and size of the models are important, but so is performance. Numbers don't really convey the experienced speed well, however.

If someone claims they run Qwen 3.6-27B at 21 tokens/second, how fast is that? Is 10 tokens/second unusable? I find these numbers objective but meaningless.

I built a script that helps me get a subjective feel for these objective numbers.

It supports text, code and reasoning + code.

https://mikeveerman.github.io/tokenspeed/

submitted by /u/MikeNonect
[link] [comments]

Leave a Comment