Nothing exhaustive... but I thought I'd report what I've seen from early testing.
I'm running a modified version of vLLM that has NVFP4 support for gemma4-26b. Weights come in around 15.76 GiB and the remainder is KV cache. I'm running full context as well.
For a "story telling" prompt and raw output with no thinking, I'm seeing about 150 t/s on TG.
TTFT in streaming mode is about 80ms.
Quality is good!
[link] [comments]