I'm making an LLM agent specialized for image processing. It combines:
- an LLM for conversation, planning, and tool use (supports a variety of LLMs)
- image generation/AI-based editing via gpt-image
- background removal via rembg (several specialized models available)
- pixelization using pyxelate
- posterization and defringing using custom algorithms
- speech-to-text (Whisper) and text-to-speech (Kokoro plus HALO)
- a nice UI based on Raylib, including file drag-and-drop
PixelClaw is free and open-source at https://github.com/JoeStrout/PixelClaw/ . You can find more demo videos there too. While you're there, if you find it interesting, please click the star ⭐️ at the top of the page; that helps me gauge interest.
submitted by