LocalLLaMA

My biggest Issue with the Gemma-4 Models is the Massive KV Cache!!

/u/Iory1998 / April 3, 2026

I mean, I have 40GB of Vram and I still cannot fit the entire Unsloth Gemma-4-31B-it-UD-Q8 (35GB) even at 2K context size unless I quantize KV to Q4 with 2K context size? WTF? For comparison, I can fit the entire UD-Q8 Qwen3.5-27B at full context witho…

LocalLLaMA

Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark

/u/PerceptionGrouchy187 / April 3, 2026

Just got Gemma 4 31B running at full 256K context on a single RTX 5090 using TurboQuant KV cache compression. System Specs Component Spec GPU NVIDIA GeForce RTX 5090 (32GB VRAM) CPU AMD Ryzen 9 9950X3D (16-core) RAM 64GB DDR5 OS Windows 11 …

LocalLLaMA

Netflix just dropped their first public model on Hugging Face: VOID: Video Object and Interaction Deletion

/u/Nunki08 / April 3, 2026

Hugging Face netflix/void-model: https://huggingface.co/netflix/void-model Project page – GitHub: https://github.com/Netflix/void-model Demo: https://huggingface.co/spaces/sam-motamed/VOID submitted by /u/Nunki08 [link] [com…

LocalLLaMA

Gemma 4 is fine great even …

/u/ThinkExtension2328 / April 3, 2026

Been playing with the new Gemma 4 models it’s amazing great even but boy did it make me appreciate the level of quality the qwen team produced and I’m able to have much larger context windows on my standard consumer hardware. submitted by…

LocalLLaMA

qwen 3.6 voting

/u/jacek2023 / April 3, 2026

I am afraid you have to use X guys https://x.com/ChujieZheng/status/2039909486153089250 submitted by /u/jacek2023 [link] [comments]

LocalLLaMA

Announcing LocalLlama discord server & bot!

/u/HOLUPREDICTIONS / August 13, 2025

INVITE: https://discord.gg/rC922KfEwj There used to be one old discord server for the subreddit but it was deleted by the previous mod. Why? The subreddit has grown to 500k users – inevitably, some users like a niche community with more technical…