9070xt inference for q3 qwen 27B

In llamacpp I'm getting 12tok/s, does this number look right to you and what can I do to increase this number (if possible)?

cd ~/llama.cpp && ./build/bin/llama-server -m models/qwen-3.6-27b-abliterated-q3.gguf -ngl 999 -c 65536 (i need this, shrinking this is not an option) -np 1 -b 512 --ubatch-size 128 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --threads 6 --jinja --no-warmup --host 0.0.0.0 --port 8080 
submitted by /u/Ok-Internal9317
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top