Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs
arXiv:2604.10074v1 Announce Type: new
Abstract: Transformer-based diffusion models have demonstrated remarkable performance at generating high-quality samples. However, our theoretical understanding of the reasons for this success remains limited. For…