| EDIT 2: Trick-Assignment-828 pointed me at the actual rule update from the mods - Rule 3 Low Effort was expanded to cover LLM-assisted posts without disclosure. Disclosing now: Disclosure: I'm a non-native English speaker (German). This post was drafted by me with AI used for a grammar pass. Structure, technical content, the ask, the Skales reference, and all decisions are mine. Wasn't aware of the rule update until called out in comments. Apologies for missing it. If mods feel this still violates Rule 3 even with disclosure, happy for it to come down. Otherwise leaving up so the technical thread can continue. --- EDIT: Updated model list based on this thread's feedback... [Original post body] Edit: Updated model list based on this thread's feedback — Qwen3.5/3.6 family and Qwen3.6-35B-A3B are the current recommendations, not the older Qwen2.5-VL / Llama 3.2 Vision references that were carried over from my older post. Thanks jacky2060, ilintar, and others for the corrections. --- Follow-up to my latest post asking about fast vision-capable local models with reliable tool calling. Got really helpful answers from this sub. Building it out now and need beta testers before the v1.2.0 release next week. --- Context for those who didn't see the first post: AIPointer is an open-source desktop overlay (Mac/Win/Linux, MIT, github.com/gonemedia/aipointer). Hold a key or wiggle your mouse, a box pops up next to your cursor, you ask anything about whatever's under the pointer, get an answer. Currently routes through cloud providers (OpenRouter, Anthropic, OpenAI, Gemini). Default UX target: sub-2s time-to-first-token. --- Based on this sub's recommendations from the earlier thread, I'm implementing Ollama as a first-class built-in provider for v1.2.0. Initial implementation supports:
A note on prior experience I've built another open-source desktop AI agent (Skales, also solo) which supports 15+ LLM providers including Ollama, LM Studio, KoboldCpp, vLLM, and any OpenAI-compatible endpoint. So the local-inference plumbing isn't new territory for me - the codepath, the tool-call schema handling, the streaming, the fallback logic, all of that I know from running it in production. What's new for AIPointer specifically is the vision + tools combination under a sub-2s TTFT budget. That's where I want real-world numbers from this sub. What I cannot test alone I have one M1 Pro and an Intel 2019 MBP. That's a single Apple Silicon data point from 2021 - says nothing about M2/M3/M4, Pro/Max/Ultra tiers, RAM scaling, RTX 3090/4090, AMD inference paths, AppImage on different distros, or Windows + NVIDIA setups. Solo dev, no test lab.. What I'm looking for Beta testers with any of:
What I'd ask testers to do
I'll fold the feedback into the v1.2.0 release notes and credit testers/contributor if you want. If we find that one model + one inference setup consistently delivers sub-2s TTFT with reliable tool calls on consumer hardware, that becomes the recommended default in onboarding. I'm not building this to compete with anyone. There's a Chrome-locked cursor companion from a big lab making the rounds, but I'd rather have a system-wide open one/open sourced that actually runs locally for people in this sub. Drop a comment with hardware + model preference and I'll DM build links. Or just grab the v1.1.1 from aipointer.app today and try cloud-mode first while you wait for v1.2.0. Source: github.com/gonemedia/aipointer (MIT) [link] [comments] |