llama.cpp – NVFP4 native support on Blackwell from now

It looks like finally we have it! Time to test!!!
https://github.com/ggml-org/llama.cpp/releases/tag/b8967
Platform: RTX 5090+(RTX5060TI - but not used during test) - Ryzen 9 9950X3D+128 GB DDR5 5600 CL36):
TEST:
CUDA_VISIBLE_DEVICES=0 /home/marcin/llama.cpp/llama-bench \

-m /home/marcin/llama.cpp_models/Qwen3.6-27B-NVFP4/Qwen3.6-27B-NVFP4.gguf \

-ngl 999 \

-fa 1 \

-p 512,2048 \

-n 128,512 \

-d 0,4096,8192,16384,32768 \

-r 5 \

-o md | tee /home/marcin/qwen3.6-27b-nvfp4-gpu0-bench-depth.md

model	size	params	backend	ngl	fa	test	t/s
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	pp512	5546.93 ± 220.29
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	pp2048	5594.58 ± 7.70
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	tg128	73.62 ± 0.16
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	tg512	73.68 ± 0.05
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	pp512 @ d4096	5232.92 ± 144.37
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	pp2048 @ d4096	5272.82 ± 7.11
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	tg128 @ d4096	72.47 ± 0.16
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	tg512 @ d4096	72.50 ± 0.06
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	pp512 @ d8192	4995.34 ± 135.04
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	pp2048 @ d8192	5005.44 ± 4.18
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	tg128 @ d8192	71.57 ± 0.18
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	tg512 @ d8192	71.61 ± 0.06
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	pp512 @ d16384	4537.54 ± 129.55
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	pp2048 @ d16384	4547.25 ± 3.11
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	tg128 @ d16384	70.04 ± 0.16
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	tg512 @ d16384	69.90 ± 0.06
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	pp512 @ d32768	3586.58 ± 71.03
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	pp2048 @ d32768	3560.58 ± 2.65
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	tg128 @ d32768	66.88 ± 0.11
qwen35 27B NVFP4	17.50 GiB	26.90 B	CUDA	999	1	tg512 @ d32768	66.98 ± 0.02

submitted by /u/mossy_troll_84
[link] [comments]

llama.cpp – NVFP4 native support on Blackwell from now – b8967

Leave a Comment