LocalLLaMA

Tool for Creating Your Own High-Quality GGUF Quants (Docs + Web UI)

/u/Thireus / April 10, 2026

For anyone interested in building their own GGUF quants, I’ve put together the GGUF-Tool-Suite docs and a simple web UI to make the process easier. Docs: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/docs Web UI: https://gguf.thireus.com/quant_…

LocalLLaMA

Stanford: Self improving Meta-Harness

/u/GodComplecs / April 10, 2026

We had Prompt engineering, then Context engineering, then Agents and Harness. Now we have Meta Harness, a harness that auto corrects its agentic mistakes and improves performance and uses less context: https://arxiv.org/abs/2603.28052 "The p…

LocalLLaMA

GLM 5.1 crushes every other model except Opus in agentic benchmark at about 1/3 of the Opus cost

/u/zylskysniper / April 10, 2026

https://preview.redd.it/s9lg647zjeug1.png?width=1161&format=png&auto=webp&s=4d0c361b5fbee97e4084e2d48543cafbc299ce25 I want to know whether GLM is another benchmark optimized model or actually useful in agents like OpenClaw, so I test…

LocalLLaMA

Finetuned a 270M model on CPU only – full weights, no LoRA, no GPU

/u/PromptInjection_ / April 10, 2026

Finetuned Gemma 3 270m on CPU only – full weights, no LoRA, no GPU, no cloud compute. ms-swift and a few minutes of patience. Small absurd dataset deliberately to make verification trivial: if the model outputs exactly what wasn't in its pretrainin…

LocalLLaMA

National University of Singapore Presents "DMax": A New Paradigm For Diffusion Language Models (dLLMs) Enabling Aggressive Parallel Decoding.

/u/44th--Hokage / April 10, 2026

TL;DR: DMax cleverly mitigates error accumulation by reforming decoding as a progressive self-refinement process, allowing the model to correct its own erroneous predictions during generation. Abstract: We present DMax, a new paradigm for effic…

LocalLLaMA

More Gemma4 fixes in the past 24 hours

/u/andy2na / April 10, 2026

Reasoning budget fix (merged): https://github.com/ggml-org/llama.cpp/pull/21697 New chat templates from Google to fix tool calling: 31B: https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja 27B: https://huggingface.co/google/gemma…

LocalLLaMA

Using OCR models with llama.cpp

/u/jacek2023 / April 10, 2026

https://huggingface.co/collections/ggml-org/ocr-models submitted by /u/jacek2023 [link] [comments]

LocalLLaMA

Using OCR models with llama.cpp (by ngxson)

/u/paf1138 / April 10, 2026

submitted by /u/paf1138 [link] [comments]

LocalLLaMA

GLM 5.1 tops the code arena rankings for open models

/u/Auralore / April 10, 2026

submitted by /u/Auralore [link] [comments]

LocalLLaMA

Creating Pi Extension with Pi and Qwen3.5 27B

/u/FeiX7 / April 10, 2026

Following my latest post about setting up Claude Code to be used with Local Models I received a recommendation in the comments to try **Pi**. The suggestion was based on its customizability and superior harness for local models. Unlike Claude Code, whi…