how i can improve inference speed

how i can improve inference speed

specs :

core i5 14400F

32gb ram d4 3200mhz

rtx 4060

current speeds

30tps in output

500 tps in prefill

command i currently use

.\llama-server.exe `

>> -m "H:\model\unsloth\Qwen3.6-35B-A3B-GGUF\Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf" `

>> --host 0.0.0.0 --port 8080 `

>> --alias "claude-sonnet-4-5" `

>> -ngl 999 `

>> --n-cpu-moe 36 `

>> -c 65535 `

>> -b 4096 `

>> -ub 2048 `

>> -t 6 `

>> -tb 10 `

>> --cont-batching `

>> --mlock `

>> -ctk turbo4 -ctv turbo3 `

>> -fa on `

>> --jinja `

>> --warmup `

>> --perf `

https://preview.redd.it/lj58sd33rszg1.png?width=1920&format=png&auto=webp&s=0f7aca149f29f9cb219ea384780a88d191f58ccd

submitted by /u/Askmasr_mod
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top