Using Qwen3.6 via LM Studio as a Claude Code subagent, saving 30x Opus tokens per task

u/Ok_Significance_9109's original post about running a local LLM as a Claude Code subagent has been useful for a few days now. I took the scripts, used it for real work, and Claude kept rewriting bits until it ran smoothly (and stopped breaking).

Long story short, I have Qwen 3.6 loaded with LM Studio, and I can use /ask-local to extract, inventory, audit, etc. It’s like a free Haiku agent. Here’s some test results:

Task	Files involved	Opus 4.7 direct	Ask-local	Per-task ratio
Inventory every route under app/api/admin: method, path, auth check, purpose, DB tables	23 route files	13k marginal (62k total)	0.4k marginal (49.4k total)	~30×
Full page inventory of an Astro site: H1, H2s, meta, CTA, disclaimer per page + layout details + consistency review	18 files (14 pages + 4 layouts)	89k marginal (138k total)	3k marginal (52k total)	~30×

Note the totals in the chart include the usual system prompt/claude.md stuff that always load with a new session (in my case, 49k). So the tasks themselves only used 0.4k/3k Opus tokens, versus 13k/89k when Opus did it alone. In a working session with multiple uses you’re guaranteed to save bigly.

As for quality, Qwen and Opus produced different but overlapping consistency in the tests above. Qwen caught an architectural issue Opus missed, Opus caught a heading hierarchy issue Qwen missed. Neither was strictly better, they just noticed different things.

Much more info in the repo: https://github.com/alisorcorp/ask-local

Runs on any OpenAI-compatible local server. Tested with unsloth’s Qwen3.6-35B-A3B-MXFP4_MOE gguf on a 64GB M4 Max. 64k context window is needed for a good time.

submitted by /u/DeliciousGorilla
[link] [comments]

Leave a Comment