MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation
arXiv:2604.19679v2 Announce Type: replace
Abstract: Recent advances in Diffusion Transformers (DiTs) have enabled high-quality joint audio-video generation, producing videos with synchronized audio within a single model. However, existing controllable…