Author name: /u/GodComplecs

Speculative decoding question, 665% speed increase

/u/GodComplecs / April 19, 2026

Im using these settings in llama.cpp: –spec-type ngram-map-k –spec-ngram-size-n 24 –draft-min 12 –draft-max 48 Whats the real reason for lets say the prompt is for "minor changes in code", whats differing between models: Gemma 4 31b: Doub…

LocalLLaMA

Stanford: Self improving Meta-Harness

/u/GodComplecs / April 10, 2026

We had Prompt engineering, then Context engineering, then Agents and Harness. Now we have Meta Harness, a harness that auto corrects its agentic mistakes and improves performance and uses less context: https://arxiv.org/abs/2603.28052 "The p…