Last Week in Multimodal AI – Local Edition

Last Week in Multimodal AI - Local Edition

I curate a weekly multimodal AI roundup, here are the local/open-source highlights from the last week:

  • Google Gemma 4 - Open model family for coding and logical reasoning with a massive context window. Runs on a single machine. Post | Models
  • TII Falcon Perception - 0.6B early-fusion VLM with open-vocabulary grounding, segmentation, and OCR. Punches way above its weight. Post | Hugging Face
  • IBM Granite 4.0 3B Vision - Compact document intelligence model for visual reasoning and data extraction. Post | Model
  • CutClaw - Open multi-agent framework that autonomously edits hours of footage into narrative short videos. Paper | GitHub | Hugging Face

https://reddit.com/link/1sfk3ml/video/bdbtxu55lwtg1/player

  • Gen-Searcher - Image generation using agentic search across styles. Hugging Face | GitHub

https://preview.redd.it/gx79bhh7lwtg1.png?width=1080&format=png&auto=webp&s=c65942c05079f00c0e20b3b385577468aed18b3c

  • GEMS - Closed-loop generation for spatial logic and text rendering. Outperforms Nano Banana 2 on GenEval2. Paper | GitHub

https://preview.redd.it/1xxjuxe2lwtg1.png?width=1080&format=png&auto=webp&s=b08a1675defa500235805d35afd7352d578bfd65

https://reddit.com/link/1sfk3ml/video/jcbgg63clwtg1/player

  • ComfyUI Post-Processing Suite - Photorealism suite by thezveroboy. Simulates sensor noise, analog artifacts, and camera metadata with base64 EXIF transfer and calibrated DNG writing. GitHub

https://preview.redd.it/r797g7n3lwtg1.png?width=990&format=png&auto=webp&s=0c25ab8481c8c78ffcbf2b4c4c0857149268b976

  • Flux FaceIR - Flux-2-klein LoRA for blind or reference-guided face restoration. GitHub

https://preview.redd.it/ywr8smv8lwtg1.png?width=1080&format=png&auto=webp&s=0cc4e704dc3adcc26e6a8a901af597248d2bf378

  • Netflix VOID - Video object deletion with physics simulation. Built on CogVideoX-5B and SAM 2. Project | Hugging Face Space

https://reddit.com/link/1sfk3ml/video/yy7d98y9lwtg1/player

  • Flux-restoration - Unified face restoration LoRA on FLUX.2-klein-base-4B. GitHub

https://preview.redd.it/uc2mdztalwtg1.png?width=1080&format=png&auto=webp&s=a16319c50496e68f6cf9a677d49ec90bf651a287

Checkout the full roundup for more demos, papers, and resources.

submitted by /u/Vast_Yak_4147
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top