In llamacpp I'm getting 12tok/s, does this number look right to you and what can I do to increase this number (if possible)?
cd ~/llama.cpp && ./build/bin/llama-server -m models/qwen-3.6-27b-abliterated-q3.gguf -ngl 999 -c 65536 (i need this, shrinking this is not an option) -np 1 -b 512 --ubatch-size 128 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --threads 6 --jinja --no-warmup --host 0.0.0.0 --port 8080 [link] [comments]