Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement
arXiv:2411.15115v3 Announce Type: replace-cross
Abstract: Recent text-to-video (T2V) diffusion models have made remarkable progress in generating high-quality videos. However, they often struggle to align with complex text prompts, particularly when m…