| It solved an issue with a script that pulls real-time data from NVIDIA SMI; Gemini 3.1 actually failed to fix it even in a fresh session, lol. It’s kind of mind-blowing how in 2026 we already have stable local models with 200k+ context! I tested it out by feeding it as many Reddit posts, random documentation files, and raw files from the llama.cpp repo as possible to bump the usage up and see how it affects my VRAM. Even during this testing, Gemma kept its mind intact! At 245,283 / 262,144 (94%) context, if I ask it what a specific user said, it matches perfectly and answers within 2–5 seconds. 245283/262144 (94%) at this contex , if i ask it to tell me what this user said and perfectly matches it and tells me , within 2-5 seconds From previous tests, I found I had to decrease the temperature and bump the repeat penalty to 1.17/1.18 so it doesn't fall into a loop of self-questioning. Above 100k context, it used to start looping through its own thoughts and arguing; instead of providing a final answer, it would just go on forever. These settings helped a lot! I'm using the latest llama.cpp (which gets updates almost every hour) and the latest Unsloth GGUF from 2–6 hours ago, so make sure to redownload! Model : gemma-4-26B-A4B-it-UD-IQ4_NL.gguf , unsloth (unsloth bis) What else i can test ? honestly i ran out of ideas to crash it! It just gulps and gulps whatever i throw at it [link] [comments] |