| u/Ok_Significance_9109's original post about running a local LLM as a Claude Code subagent has been useful for a few days now. I took the scripts, used it for real work, and Claude kept rewriting bits until it ran smoothly (and stopped breaking). Long story short, I have Qwen 3.6 loaded with LM Studio, and I can use /ask-local to extract, inventory, audit, etc. It’s like a free Haiku agent. Here’s some test results:
Note the totals in the chart include the usual system prompt/claude.md stuff that always load with a new session (in my case, 49k). So the tasks themselves only used 0.4k/3k Opus tokens, versus 13k/89k when Opus did it alone. In a working session with multiple uses you’re guaranteed to save bigly. As for quality, Qwen and Opus produced different but overlapping consistency in the tests above. Qwen caught an architectural issue Opus missed, Opus caught a heading hierarchy issue Qwen missed. Neither was strictly better, they just noticed different things. Much more info in the repo: https://github.com/alisorcorp/ask-local Runs on any OpenAI-compatible local server. Tested with unsloth’s Qwen3.6-35B-A3B-MXFP4_MOE gguf on a 64GB M4 Max. 64k context window is needed for a good time. [link] [comments] |