Built a streaming visualization plugin for Open WebUI — your local model now paints interactive SVGs/dashboards into chat token-by-token as it generates

Shipped an Open WebUI plugin that lets your local model render interactive visualizations inline in chat — painted live, token-by-token, as the model generates.

Not static. Not "the diagram appears when the response finishes." The SVG literally assembles itself in front of you as tokens stream in. Cards appear one at a time. Chart.js bars populate column by column. First elements render within ~50ms of the opening marker.

How it works

The tool mounts an empty iframe. The model then emits HTML/SVG between plain-text @@@VIZ-START ... @@@VIZ-END markers in its response. A same-origin iframe observer tails the parent chat DOM, extracts the growing block, runs it through a safe-cut HTML parser, and reconciles new nodes into the iframe as tokens arrive.

Why it's not trivial

Streaming partial HTML into a live iframe without breaking is harder than innerHTML = partial. Naive approach gives constant flicker, animations retriggering, and scripts executing before dependencies load.

Safe-cut HTML parser tracks tokenizer state across TEXT / TAG / ATTR / script-data-escape / CDATA transitions. Flushes the longest valid prefix on each chunk. Matters because models emit <script> tags inside SVG containing <!-- and <script as string literals — naive cut corrupts the enclosing script boundary.
Incremental DOM reconciler. Parses each safe-cut into a detached tree and walks in parallel with the live tree. Appends only new nodes. Existing nodes never re-mount — animations don't retrigger and scroll position holds through 10k-line SVGs.

Local model compatibility

Any model that can reliably follow "emit text between these two markers and use this design system" works. Tested on:

GPT-OSS 120B — solid
Qwen 3.5 27B — I found it surprisingly good myself, and multiple users confirmed in the v1 thread
Haiku 4.5 / Sonnet 4.5 / Opus 4.7 / GPT-5.4 — reference frontier models

The skill file (shipped alongside the tool) teaches the model the protocol, design system, color ramps, SVG boilerplate, sendPrompt patterns, and common failure modes. The model doesn't have to invent any of that — it just follows the spec.

Latency

Tool call itself is ~50ms (builds HTML wrapper + CSP headers). All render cost is paid during token generation, in parallel with it. No separate rendering phase.

What you actually use it for

Clickable architecture diagrams (click a component → model explains it)
Interactive quizzes that grade themselves via a sendPrompt bridge
Dashboards generated from Python tool output
Live study cards with sliders for inference params
Periodic tables where clicking an element drills down

Other stuff included

Six JS bridges (sendPrompt, openLink, copyText, toast, saveState, loadState — last two are per-message-scoped localStorage)
9-ramp color system with automatic light/dark adaptation
Three CSP levels (strict / balanced / none)
230 localized strings across 46 languages
Done toast + optional chime on stream finalize (only fires on witnessed streams, silent on refreshes)
30s idle-finalize fallback for stream stalls

All in one tool.py + one SKILL.md. No Open WebUI core patches.

Install

Paste tool.py into Workspace → Tools
Paste SKILL.md into Workspace → Knowledge as a skill named visualize
Attach both to your model, native function calling on
Settings → Interface → enable "Allow iframe same origin" (required — observer needs it to read chat DOM)

GitHub + README + demo video + screenshots

submitted by /u/ClassicMain
[link] [comments]