I built a local AI companion with GWT, IIT proxy, ChromaDB hybrid retrieval, and Ollama fallback — here’s every architectural decision I made and why
Been building this for a while. Sharing now because it's past the point where I'm embarrassed by the code. **The stack:** * Python 3.12, 18k+ lines, 470+ tests passing * Gemini 2.5 Flash (primary) + Ollama qwen3:4b (local fallback via circuit b…