Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement

Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal / April 21, 2026

arXiv:2411.15115v3 Announce Type: replace-cross
Abstract: Recent text-to-video (T2V) diffusion models have made remarkable progress in generating high-quality videos. However, they often struggle to align with complex text prompts, particularly when m…

Author name: Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement