Get 30K more context using Q8 mmproj with Gemma 4

Hey guys, quick follow up to my post yesterday about running Gemma 4 26B.

I kept testing and realized you can just use the Q8_0 mmproj for vision instead of F16. There is no quality drop, and it actually performed a bit better in a few of my tests (with --image-min-tokens 300 --image-max-tokens 512). You can easily hit 60K+ total context with an FP16 cache and still keep vision enabled.

Here is the Q8 mmproj I used : https://huggingface.co/prithivMLmods/gemma-4-26B-A4B-it-F32-GGUF/blob/main/GGUF/gemma-4-26B-A4B-it.mmproj-q8_0.gguf

Link to original post (and huge thanks to this comment for the tip!).

Quick heads up: Regarding the regression on post b8660 builds, a fix has already been approved and will be merged soon. Make sure to update it after the merge.

submitted by /u/Sadman782
[link] [comments]

Leave a Comment