| 99% of "AI" apps are just GPT wrappers that pipe your data to cloud LLMs and call it a product. No one's ever created an intelligence layer that understands your entire digital life (all your screenshots, notes, files...) before, because that’d mean sending all your data to the cloud:
But on-device models are generally too dumb and run too slowly. I spent close to a year optimizing every single layer of the on-device AI stack from scratch! I modified Apple's MLX framework for batch multimodal inference (it wasn't built for this), transplanted vision capabilities from a 4× larger model [Qwen 3.5 9B] into a smaller one [Qwen 3.5 2B], built custom k-quants specifically for MLX, wrote device-aware quantization tuned per chip's available RAM, and implemented proprietary KV cache reuse + flash attention for inference speed. Sentient OS analyzes and understands your entire digital life overnight while your device charges. This unlocks: 1️⃣ Talk to your entire digital life: "what was that wine I liked?" "who did I wanna meet next week?" 2️⃣ Proactive reminders surfaced from your own data: "Tickets for that concert you screenshoted open tomorrow!" 3️⃣ Knowledge graphs of your entire digital life: tap any node to find what you buried! And with MCP, your existing LLM (ChatGPT, Claude, etc.) can talk to your digital life too; so it actually understands you! Early alpha processes ~3,000 screenshots entirely on-device on a 6 year old iPhone. Coming to Mac & iPhone soon (and Windows & Android in the near future!) The first 150 users get lifetime free access 🔑 Would really love to hear from y’all: what more would you want an on-device multimodal LLM that understands your entire life to do? [link] [comments] |