| I've been building a macOS app called Skiagrafia that takes folders of photos and produces layered SVG vector graphics and TIFF alpha mattes. The entire inference stack runs locally — TRANSFORMERS_OFFLINE=1, HF_HUB_OFFLINE=1, Ollama for the VLM, MPS as the primary backend on M1 Ultra. My first plans for the pipeline. The pipeline:
Total weight resident in unified memory: ~5GB. Runs fine on 64GB M1 Ultra. The part that might interest this community: Moondream was chosen over LLaVA, MiniCPM-V, and LlamaVL specifically because at ~1.5GB it processes an image in ~100ms on MPS. For a 2,000-image batch, a 7B model's richer descriptions don't justify a 10× inference time increase when all you need is a noun list. Small and fast wins for this task. I wrote up the full architecture, including model selection rationale, the Protocol-based DI system, recursive child segmentation, and five design principles from five rewrites. Article: tsevis.com/every-pixel-is-a-tesserae Happy to answer questions about the MPS compatibility issues I ran into or the Ollama integration. [link] [comments] |