LocalLLaMA

LocalLLaMA

Gemma 4 for 16 GB VRAM

I think the 26B A4B MoE model is superior for 16 GB. I tested many quantizations, but if you want to keep the vision, I think the best one currently is: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/blob/main/gemma-4-26B-A4B-it-UD-IQ4_XS.gguf …

LocalLLaMA

Gemma 4 vs Whisper

Working on building live Closed Captions for Discord calls for my TTRPG group. With Gemma being able to do voice transcription and translation, does it still make sense to run Whisper + a smaller model for translation? Is it better, faster, or has some…

LocalLLaMA

its all about the harness

over the course of the arc of local model history (the past six weeks) we have reached a plateau with models and quantization that would have left our ancient selves (back in the 2025 dark ages) stunned and gobsmacked at the progress we currently enjoy…

Scroll to Top