RTX 5090 gemma4-26b TG performance report

Nothing exhaustive... but I thought I'd report what I've seen from early testing.

I'm running a modified version of vLLM that has NVFP4 support for gemma4-26b. Weights come in around 15.76 GiB and the remainder is KV cache. I'm running full context as well.

For a "story telling" prompt and raw output with no thinking, I'm seeing about 150 t/s on TG.
TTFT in streaming mode is about 80ms.

Quality is good!

submitted by /u/Nice_Cellist_7595
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top