Shipped an Open WebUI plugin that lets your local model render interactive visualizations inline in chat — painted live, token-by-token, as the model generates.
Not static. Not "the diagram appears when the response finishes." The SVG literally assembles itself in front of you as tokens stream in. Cards appear one at a time. Chart.js bars populate column by column. First elements render within ~50ms of the opening marker.
How it works
The tool mounts an empty iframe. The model then emits HTML/SVG between plain-text @@@VIZ-START ... @@@VIZ-END markers in its response. A same-origin iframe observer tails the parent chat DOM, extracts the growing block, runs it through a safe-cut HTML parser, and reconciles new nodes into the iframe as tokens arrive.
Why it's not trivial
Streaming partial HTML into a live iframe without breaking is harder than innerHTML = partial. Naive approach gives constant flicker, animations retriggering, and scripts executing before dependencies load.
Safe-cut HTML parser tracks tokenizer state across TEXT / TAG / ATTR / script-data-escape / CDATA transitions. Flushes the longest valid prefix on each chunk. Matters because models emit <script> tags inside SVG containing <!-- and <script as string literals — naive cut corrupts the enclosing script boundary.
Incremental DOM reconciler. Parses each safe-cut into a detached tree and walks in parallel with the live tree. Appends only new nodes. Existing nodes never re-mount — animations don't retrigger and scroll position holds through 10k-line SVGs.
Local model compatibility
Any model that can reliably follow "emit text between these two markers and use this design system" works. Tested on:
- GPT-OSS 120B — solid
- Qwen 3.5 27B — I found it surprisingly good myself, and multiple users confirmed in the v1 thread
- Haiku 4.5 / Sonnet 4.5 / Opus 4.7 / GPT-5.4 — reference frontier models
The skill file (shipped alongside the tool) teaches the model the protocol, design system, color ramps, SVG boilerplate, sendPrompt patterns, and common failure modes. The model doesn't have to invent any of that — it just follows the spec.
Latency
Tool call itself is ~50ms (builds HTML wrapper + CSP headers). All render cost is paid during token generation, in parallel with it. No separate rendering phase.
What you actually use it for
- Clickable architecture diagrams (click a component → model explains it)
- Interactive quizzes that grade themselves via a
sendPrompt bridge - Dashboards generated from Python tool output
- Live study cards with sliders for inference params
- Periodic tables where clicking an element drills down
Other stuff included
- Six JS bridges (
sendPrompt, openLink, copyText, toast, saveState, loadState — last two are per-message-scoped localStorage) - 9-ramp color system with automatic light/dark adaptation
- Three CSP levels (strict / balanced / none)
- 230 localized strings across 46 languages
- Done toast + optional chime on stream finalize (only fires on witnessed streams, silent on refreshes)
- 30s idle-finalize fallback for stream stalls
All in one tool.py + one SKILL.md. No Open WebUI core patches.
Install
- Paste
tool.py into Workspace → Tools - Paste
SKILL.md into Workspace → Knowledge as a skill named visualize - Attach both to your model, native function calling on
- Settings → Interface → enable "Allow iframe same origin" (required — observer needs it to read chat DOM)
GitHub + README + demo video + screenshots
submitted by