cs.CV

Improving Controllable Generation: Faster Training and Better Performance via $x_0$-Supervision

arXiv:2604.05761v1 Announce Type: new
Abstract: Text-to-Image (T2I) diffusion/flow models have recently achieved remarkable progress in visual fidelity and text alignment. However, they remain limited when users need to precisely control image layouts…