cs.AI, cs.CV

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision

arXiv:2605.05781v1 Announce Type: new
Abstract: Unified multimodal models are envisioned to bridge the gap between understanding and generation. Yet, to achieve competitive performance, state-of-the-art models adopt largely decoupled understanding and…