Gemma4 26B A4B NVFP4 GGUF

Hey everyone!

I’ve just uploaded a GGUF version of nvidia/Gemma-4-26B-A4B-NVFP4. It is not currently possible to run it with the main branch of llama.cpp, so I’ve also made a Docker image for it. It’s available at catlilface/llama.cpp:gemma4_26b_nvfp4.

Unfortunately, I don’t have any resources other than my 5070Ti to properly test this model, so your feedback is highly welcome.

Special thanks to ynankani for his contribution to llama.cpp, which made this quantization possible.

Note that there are currently performance issues with CPU offloading.

HF repo: https://huggingface.co/catlilface/Gemma-4-26B-A4B-NVFP4-GGUF

submitted by /u/catlilface69
[link] [comments]

Leave a Comment