LocalLLaMA

GLM-5.1 on HF

/u/Soft-Wedding4595 / April 7, 2026

GLM 5.1 will come out any second lads submitted by /u/Soft-Wedding4595 [link] [comments]

LocalLLaMA

GLM-5.1 – a zai-org Collection

/u/adefa / April 7, 2026

Empty collection so far 🙂 submitted by /u/adefa [link] [comments]

LocalLLaMA

You guys seen this? beats turboquant by 18%

/u/OmarBessa / April 7, 2026

https://github.com/Dynamis-Labs/spectralquant basically, they discard 97% of the kv cache key vectors after figuring out which ones have the most signal submitted by /u/OmarBessa [link] [comments]

LocalLLaMA

Auto-creation of agent SKILLs from observing your screen via Gemma 4 for any agent to execute and self-improve

/u/Objective_River_5218 / April 7, 2026

AgentHandover is an open-source Mac menu bar app that watches your screen through Gemma 4 (running locally via Ollama) and turns your repeated workflows into structured Skill files that any agent can follow. I built it because every time I wanted…

LocalLLaMA

DFlash: Block Diffusion for Flash Speculative Decoding.

/u/Total-Resort-3120 / April 7, 2026

https://z-lab.ai/projects/dflash/ https://github.com/z-lab/dflash https://huggingface.co/collections/z-lab/dflash submitted by /u/Total-Resort-3120 [link] [comments]

LocalLLaMA

You can now fine-tune Gemma 4 locally 8GB VRAM + Bug Fixes

/u/danielhanchen / April 7, 2026

Hey guys, you can now fine-tune Gemma 4 E2B and E4B in our free Unsloth notebooks! You need 8GB VRAM to train Gemma-4-E2B locally. Unsloth trains Gemma 4 ~1.5x faster with ~50% less VRAM than FA2 setups: https://github.com/unslothai/unsloth We al…

LocalLLaMA

Training a 1.1B SLM at home

/u/JordanJtech / April 7, 2026

Hey all. Thought I'd share my journey. I've been fascinated with AI and LLMs, and started building apps for consumer devices (phones) and realized the market for fast, usable models for consumer hardware has felt more like an afterthought…

LocalLLaMA

TurboQuant – Extreme KV Cache Quantization · ggml-org/llama.cpp · Discussion #20969

/u/pmttyji / April 7, 2026

14+ independent validators now across Metal, CUDA, HIP, Vulkan, and MLX. Apple Silicon, NVIDIA (4090, 5090, H100, A100, V100, 1080 Ti), AMD (RX 9070 XT, RX 6600). from M1 to Blackwell. this is what open source research looks like. the data converges. …

LocalLLaMA

Meta AI Releases EUPE

/u/techlatest_net / April 7, 2026

A Compact Vision Encoder Family Under 100M Parameters That Rivals Specialist Models Across Image Understanding, Dense Prediction, and VLM Tasks Link: https://github.com/facebookresearch/EUPE submitted by /u/techlatest_net [link] […

LocalLLaMA

Gemma 4 31B GGUF quants ranked by KL divergence (unsloth, bartowski, lmstudio-community, ggml-org)

/u/oobabooga4 / April 7, 2026

submitted by /u/oobabooga4 [link] [comments]