Things I got wrong building a confidence evaluator for local LLMs [D]
I've been building **Autodidact**, a local-first AI agent framework. The central piece is a **confidence evaluator** – something that decides whether a small local model (Qwen 2.5 7B, Llama 3.1 8B, Mistral 7B) can answer a question, or whether to e…