After 8 months of running everything local, ive accepted the productivity tools also have to be local

Quick context: M3 max 64gb, currently running llama 3.3 70b q4 as my daily driver via ollama, qwen3 coder 30b for code (switched from qwen2.5 earlier this year), mlx for the smaller stuff. tried llama 4 scout earlier this year but 64gb is too tight at q4 once you leave headroom for context, sticking with 3.3 70b until i upgrade hardware. moved off the claude api around september last year. the cost wasnt the main reason, the data was. i do contract work with NDAs and i was getting uncomfortable about how much code and how many internal docs were going through anthropics servers.

Heres the thing i didnt expect. once i moved my LLM local, everything else upstream and downstream of the LLM started feeling wrong by comparison.

Example. i was on rewind.ai for a long time because the screen memory thing is genuinely useful for me. then meta bought them in december and the mac app went away, like everyone elses. tried a couple of the diy alternatives that picked up steam after, screenpipe in particular, didnt quite stick for me.

I was using granola for meeting notes. same story, transcripts go up. uninstalled.

I think a lot of you are running into this same thing if youre being honest. you spend 6 months building a private LLM stack, and then you realize half the productivity stuff orbiting your work is still shipping your raw artifacts up to a saas backend, full meeting transcripts, full doc bodies, sometimes whole screen frames, and you havent gone local on the data side at all. youve only gone local on inference. those are two different fights and most people havent had the second one yet. and even on the data side its a spectrum, not a switch, "raw never leaves" is a different bar than "no packets out at all", and you have to actually decide which one is your threshold before you can pick tools.

What i ended up with as of now:

ollama for inference, llama 3.3 70b q4 for general, qwen3 coder 30b for code, nomic-embed-text-v2 for vectors
local whisper.cpp for transcription, the cpu lift is fine for my use
obsidian, plain markdown, encrypted volume on disk
AirJelly for screen and cross app memory, the one item in this list that isnt strictly local, caveats below

i was most skeptical about that last one. to be upfront, airjelly isnt pure offline. captures themselves are stored on disk locally, but the vision model inference round-trips to their backend with just the cropped frame it needs to analyze. raw screen history doesnt leave the machine, which was my actual threshold. if your bar is "literally no packets out" this isnt that. no, i cant point the vision side at my own ollama, which i would have preferred. their argument is its a specific finetune they control. fair, not going to fight it.

before paying for it i did try to roll my own. screencapture-cli + tesseract + a local embedding model into sqlite, got to toy level over a weekend. what i couldnt match alone was the cross-app event correlation (knowing which slack thread i was on when reading a specific commit) and the index updates destroying my battery once the corpus got real. ended up writing more memory plumbing than i wanted to. decided id rather spend that time on actual contract work.

what i use it for is the cross-repo thing. reading three different inference server implementations comparing how they handle continuous batching, and two weeks later trying to remember which one had speculative decode wired in cleanly. used to lose those mid-day. its not perfect, sometimes the search-side model hallucinates a snippet that isnt actually in the history, occasionally returns an answer in korean for no obvious reason, classic LLM stuff. but its better than the bookmark folder i used to maintain.

This stack isnt as smart as the frontier APIs. claude opus 4.7 is still better at long context reasoning than my local setup by a real margin. but for 80% of my actual work the gap doesnt matter, and the 20% where it does i just dont send sensitive stuff to the API anymore. for non sensitive stuff i still hit the API directly from a clean machine.

The meta point is that going local on the model is the easy part. its everything around the model that takes commitment. if youve done the LLM part already, the rest is just one tool at a time.

submitted by /u/Inevitable_Wear_9107
[link] [comments]

Leave a Comment