Follow-up: adding Ollama support to my open-source cursor-aware AI app – looking for beta testers with vision-capable local models

EDIT 2: Trick-Assignment-828 pointed me at the actual rule update from the mods - Rule 3 Low Effort was expanded to cover LLM-assisted posts without disclosure. Disclosing now:

Disclosure: I'm a non-native English speaker (German). This post was drafted by me with AI used for a grammar pass. Structure, technical content, the ask, the Skales reference, and all decisions are mine. Wasn't aware of the rule update until called out in comments. Apologies for missing it.

If mods feel this still violates Rule 3 even with disclosure, happy for it to come down. Otherwise leaving up so the technical thread can continue.

---

EDIT: Updated model list based on this thread's feedback...

[Original post body]

Edit: Updated model list based on this thread's feedback — Qwen3.5/3.6 family and Qwen3.6-35B-A3B are the current recommendations, not the older Qwen2.5-VL / Llama 3.2 Vision references that were carried over from my older post. Thanks jacky2060, ilintar, and others for the corrections.

---

Follow-up to my latest post asking about fast vision-capable local models with reliable tool calling. Got really helpful answers from this sub. Building it out now and need beta testers before the v1.2.0 release next week.

---

Context for those who didn't see the first post:

AIPointer is an open-source desktop overlay (Mac/Win/Linux, MIT, github.com/gonemedia/aipointer). Hold a key or wiggle your mouse, a box pops up next to your cursor, you ask anything about whatever's under the pointer, get an answer. Currently routes through cloud providers (OpenRouter, Anthropic, OpenAI, Gemini). Default UX target: sub-2s time-to-first-token.

---

Based on this sub's recommendations from the earlier thread, I'm implementing Ollama as a first-class built-in provider for v1.2.0.

Initial implementation supports:

Auto-detect on localhost:11434
Model dropdown populated from /api/tags
Vision + text input pipeline (region screenshot routes to vision model)
Tool calling for AIPointer's 10 built-in tools (fetch_url, open_url, search_web, play_music, set_volume, copy_to_clipboard, read_clipboard, launch_app, save_document, reveal_in_finder)
Per-model timeout (uncapped option for large models on slower hardware)
Same config UX as the cloud providers — just point it at Ollama, pick model, done

A note on prior experience

I've built another open-source desktop AI agent (Skales, also solo) which supports 15+ LLM providers including Ollama, LM Studio, KoboldCpp, vLLM, and any OpenAI-compatible endpoint. So the local-inference plumbing isn't new territory for me - the codepath, the tool-call schema handling, the streaming, the fallback logic, all of that I know from running it in production. What's new for AIPointer specifically is the vision + tools combination under a sub-2s TTFT budget. That's where I want real-world numbers from this sub.

What I cannot test alone

I have one M1 Pro and an Intel 2019 MBP. That's a single Apple Silicon data point from 2021 - says nothing about M2/M3/M4, Pro/Max/Ultra tiers, RAM scaling, RTX 3090/4090, AMD inference paths, AppImage on different distros, or Windows + NVIDIA setups. Solo dev, no test lab..

What I'm looking for

Beta testers with any of:

M-series Mac (M1/M2/M3/M4, Pro/Max/Ultra) - measuring TTFT against Gemini 2+3 Flash cloud baseline
RTX 3090, 4090, or 5090 on Windows or Linux - same baseline
AMD GPU on Linux (ROCm) - would love to know if this works at all
16GB-class VRAM cards - checking what's the realistic model ceiling
Mac mini M4 or M4 Pro - fastest consumer Apple Silicon, want to see TTFT

What I'd ask testers to do

Install AIPointer (signed + notarized on Mac, NSIS on Windows, AppImage on Linux)
Point it at your local Ollama, pick a vision model (Qwen2.5-VL, MiniCPM-V, Llama 3.2 Vision, Pixtral, whatever you already have running)
Use it for 30-60 minutes of normal daily stuff - screenshots, region queries, tool calls
Send back: TTFT numbers, model + quant + hardware, what worked, what didn't, any tool-call failures

I'll fold the feedback into the v1.2.0 release notes and credit testers/contributor if you want. If we find that one model + one inference setup consistently delivers sub-2s TTFT with reliable tool calls on consumer hardware, that becomes the recommended default in onboarding.

I'm not building this to compete with anyone. There's a Chrome-locked cursor companion from a big lab making the rounds, but I'd rather have a system-wide open one/open sourced that actually runs locally for people in this sub.

Drop a comment with hardware + model preference and I'll DM build links. Or just grab the v1.1.1 from aipointer.app today and try cloud-mode first while you wait for v1.2.0.

Source: github.com/gonemedia/aipointer (MIT)

submitted by /u/yaboyskales
[link] [comments]

Leave a Comment