llama.cpp – NVFP4 native support on Blackwell from now – b8967

llama.cpp - NVFP4 native support on Blackwell from now - b8967

It looks like finally we have it! Time to test!!!
https://github.com/ggml-org/llama.cpp/releases/tag/b8967
Platform: RTX 5090+(RTX5060TI - but not used during test) - Ryzen 9 9950X3D+128 GB DDR5 5600 CL36):
TEST:
CUDA_VISIBLE_DEVICES=0 /home/marcin/llama.cpp/llama-bench \

-m /home/marcin/llama.cpp_models/Qwen3.6-27B-NVFP4/Qwen3.6-27B-NVFP4.gguf \

-ngl 999 \

-fa 1 \

-p 512,2048 \

-n 128,512 \

-d 0,4096,8192,16384,32768 \

-r 5 \

-o md | tee /home/marcin/qwen3.6-27b-nvfp4-gpu0-bench-depth.md

model size params backend ngl fa test t/s
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 pp512 5546.93 ± 220.29
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 pp2048 5594.58 ± 7.70
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 tg128 73.62 ± 0.16
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 tg512 73.68 ± 0.05
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 pp512 @ d4096 5232.92 ± 144.37
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 pp2048 @ d4096 5272.82 ± 7.11
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 tg128 @ d4096 72.47 ± 0.16
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 tg512 @ d4096 72.50 ± 0.06
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 pp512 @ d8192 4995.34 ± 135.04
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 pp2048 @ d8192 5005.44 ± 4.18
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 tg128 @ d8192 71.57 ± 0.18
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 tg512 @ d8192 71.61 ± 0.06
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 pp512 @ d16384 4537.54 ± 129.55
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 pp2048 @ d16384 4547.25 ± 3.11
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 tg128 @ d16384 70.04 ± 0.16
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 tg512 @ d16384 69.90 ± 0.06
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 pp512 @ d32768 3586.58 ± 71.03
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 pp2048 @ d32768 3560.58 ± 2.65
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 tg128 @ d32768 66.88 ± 0.11
qwen35 27B NVFP4 17.50 GiB 26.90 B CUDA 999 1 tg512 @ d32768 66.98 ± 0.02
submitted by /u/mossy_troll_84
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top