LocalLLaMA

Built a zero allocation, header only C++ Qwen tokenizer that is nearly 20x faster than openai Tiktoken

/u/yassa9 / April 3, 2026

I'm into HPC, and C++ static, zero allocation and zero dependancy software. I was studying BPE tokenizers, how do they work, so decided to build that project. I hardcoded qwen tokenizer for LLMs developers. I really know that whole Tokenizati…

LocalLLaMA

Smaller models are getting scary good.

/u/Numerous-Campaign844 / April 3, 2026

I am still processing this lol. I gave both Gemini 3 Deepthink and Gemma 4 (31B) the exact same complex security puzzle (which was secretly an unwinnable paradox). Gemini completely fell for the trap. It spit out this incredibly professional-look…

LocalLLaMA

Visual Guide to Gemma 4

/u/jacek2023 / April 3, 2026

source: https://x.com/osanseviero/status/2040105484061954349 https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4 submitted by /u/jacek2023 [link] [comments]

LocalLLaMA

My biggest Issue with the Gemma-4 Models is the Massive KV Cache!!

/u/Iory1998 / April 3, 2026

I mean, I have 40GB of Vram and I still cannot fit the entire Unsloth Gemma-4-31B-it-UD-Q8 (35GB) even at 2K context size unless I quantize KV to Q4 with 2K context size? WTF? For comparison, I can fit the entire UD-Q8 Qwen3.5-27B at full context witho…

LocalLLaMA

Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark

/u/PerceptionGrouchy187 / April 3, 2026

Just got Gemma 4 31B running at full 256K context on a single RTX 5090 using TurboQuant KV cache compression. System Specs Component Spec GPU NVIDIA GeForce RTX 5090 (32GB VRAM) CPU AMD Ryzen 9 9950X3D (16-core) RAM 64GB DDR5 OS Windows 11 …

LocalLLaMA

Netflix just dropped their first public model on Hugging Face: VOID: Video Object and Interaction Deletion

/u/Nunki08 / April 3, 2026

Hugging Face netflix/void-model: https://huggingface.co/netflix/void-model Project page – GitHub: https://github.com/Netflix/void-model Demo: https://huggingface.co/spaces/sam-motamed/VOID submitted by /u/Nunki08 [link] [com…

LocalLLaMA

Gemma 4 is fine great even …

/u/ThinkExtension2328 / April 3, 2026

Been playing with the new Gemma 4 models it’s amazing great even but boy did it make me appreciate the level of quality the qwen team produced and I’m able to have much larger context windows on my standard consumer hardware. submitted by…

LocalLLaMA

qwen 3.6 voting

/u/jacek2023 / April 3, 2026

I am afraid you have to use X guys https://x.com/ChujieZheng/status/2039909486153089250 submitted by /u/jacek2023 [link] [comments]

LocalLLaMA

Announcing LocalLlama discord server & bot!

/u/HOLUPREDICTIONS / August 13, 2025

INVITE: https://discord.gg/rC922KfEwj There used to be one old discord server for the subreddit but it was deleted by the previous mod. Why? The subreddit has grown to 500k users – inevitably, some users like a niche community with more technical…