LocalLLaMA

RTX 5090 gemma4-26b TG performance report

Nothing exhaustive… but I thought I'd report what I've seen from early testing. I'm running a modified version of vLLM that has NVFP4 support for gemma4-26b. Weights come in around 15.76 GiB and the remainder is KV cache. I'm running…