/u/Askmasr_mod - Provide.ai

how i can improve inference speed

/u/Askmasr_mod / May 7, 2026

specs : core i5 14400F 32gb ram d4 3200mhz rtx 4060 current speeds 30tps in output 500 tps in prefill command i currently use .\llama-server.exe ` >> -m "H:\model\unsloth\Qwen3.6-35B-A3B-GGUF\Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf" ` &gt…

Author name: /u/Askmasr_mod

how i can improve inference speed