Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers
arXiv:2605.14270v1 Announce Type: new
Abstract: Multimodal Diffusion Transformers (MM-DiTs) have achieved remarkable progress in text-to-image generation, yet they frequently suffer from concept omission, where specified objects or attributes fail to …